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Abstract 

A fundamental problem faced in the design of almost all packet networks is that of 
efficient operation — of reliably communicating given messages among nodes at mini- 
mum cost in resource usage. We present a solution to the efficient operation problem 
for coded packet networks, i.e., packet networks where the contents of outgoing pack- 
ets are arbitrary, causal functions of the contents of received packets. Such networks 
are in contrast to conventional, routed packet networks, where outgoing packets are 
restricted to being copies of received packets and where reliability is provided by the 
use of retransmissions. 

This thesis introduces four considerations to coded packet networks: 

1. efficiency, 

2. the lack of synchronization in packet networks, 

3. the possibility of broadcast links, and 

4. packet loss. 

We take these considerations and give a prescription for operation that is novel and 
general, yet simple, useful, and extensible. 

We separate the efficient operation problem into two smaller problems, which we 
call network coding — the problem of deciding what coding operation each node should 
perform given the rates at which packets are injected on each link — and subgraph 
selection — the problem of deciding those rates. Our main contribution for the network 
coding problem is to give a scheme that achieves the maximum rate of a multicast 
connection under the given injection rates. As a consequence, the separation of 
network coding and subgraph selection results in no loss of optimality provided that 
we are constrained to only coding packets within a single connection. Our main 
contribution for the subgraph selection problem is to give distributed algorithms that 
optimally solve the single-connection problem under certain assumptions. Since the 
scheme we propose for network coding can easily be implemented in a distributed 
manner, we obtain, by combining the solutions for each of the smaller problems, a 
distributed approach to the efficient operation problem. 



We assess the performance of our solution for three problems: minimum-trans- 
mission wireless unicast, minimum- weight wireline multicast, and minimum-energy 
wireless multicast. We find that our solution has the potential to offer significant effi- 
ciency improvements over existing techniques in routed packet networks, particularly 
for multi-hop wireless networks. 

Thesis Supervisor: Muriel Medard 

Title: Esther and Harold Edgerton Associate Professor of Electrical Engineering 



Preface 



Vladimir Nabokov once opined, "My loathings are simple: stupidity, oppression, 
crime, cruelty, soft music. My pleasures are the most intense known to man: writ- 
ing and butterfly hunting." I share all of Nabokov's loathings, but only one of his 
pleasures — and that began only recently. Of course, the lepidoptera I've been in- 
volved with are none that Nabokov would recognize or, I imagine, much revere. Nev- 
ertheless, the butterflies to which I refer — from the butterfly network of Ahlswede 
et al. (see Figure 7 of j2]) to its wireless counterpart (see Figure 1 of [TS]) to fur- 
ther generalizations — have certainly given me a great deal of pleasure since I began 
investigating network coding in the spring of 2003. 

This thesis represents the culmination of my work over the last three years, which 
began with the simple question, how would all this actually work? I was intrigued by 
network coding. But I couldn't quite reconcile it with the way that I understood data 
networks to operate. So I thought to take the basic premise of network coding and 
put it in a model that, at least to me, was more satisfying. The following pages lay 
out a view of coded packet networks that, while certainly not the only one possible, 
is one that I believe is simple, relevant, and extensible — I can only hope that it is 
sufficiently so to be truly useful. 

Various parts of the work in this thesis appear in various published papers [211 
EOlinilZSIZSllZllZniEZllZHland various as yet unpublished papers [73 EH]. A brief 
glance at the author lists of the these papers, and it is evident that I cannot claim 
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sole credit for this work — many others are involved. 

My adviser, Professor Muriel Medard, is foremost among them. I would like 
to thank her for all that she has taught me and all that she has done to aid my 
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titude of demands on her time continues to amaze and inspire me. I would like to 
thank also my thesis readers, Professors Michelle Effros, Ralf Koetter, and John Tsit- 
siklis. All have contributed helpful discussions and advice. I would like to thank Ralf 
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enthusiasm have been invaluable. 
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I am. But I want to keep this to those to whom I am really indebted the most: Mum, 
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Introduction 



FUNDAMENTAL problem faced in the design of almost all packet networks is that 



^ A. of efficient operation — of reliably communicating given messages among nodes 
at minimum cost in resource usage. At present, the problem is generally addressed 
in the following way: messages admitted into the network are put into packets that 
are routed hop-by-hop toward their destinations according to paths chosen to meet 
the goal of efficiency, e.g., to achieve low energy consumption, to achieve low latency, 
or, more generally, to incur low cost of any sort. As packets travel along these paths, 
they are occasionally lost because of various reasons, which include buffer overflow, 
link outage, and collision; so, to ensure reliability, retransmissions of unacknowledged 
packets are sent either on a link-by-link basis, an end-to-end basis, or both. This 
mode of operation crudely characterizes the operation of the internet and has held 
sway since at least its advent. 

But much has changed about packet networks since the advent of the internet. 
The underlying communications technologies have changed, as have the types of ser- 
vices demanded, and, under these changes, the mode of operation described above 
has met with difficulties. We give two examples. First, while wireline communi- 
cations were once dominant in packet networks, wireless communications involving 
nodes on the ground, in the air, in space, and even underwater are now increasingly 




2 



CHAPTER 1. INTRODUCTION 



prevalent. In networks where such wireless links are present, this mode of operation 
can certainly be made to work, but we encounter problems — most notably with the 
use of retransmissions. Wireless links are highly unreliable compared to wireline ones 
and are sometimes associated with large propagation delays, which means that, not 
only are more retransmissions required, but packet acknowledgments are themselves 
sometimes lost or subject to large delay, leading to substantial inefficiencies from the 
retransmission of unacknowledged packets. Moreover, hop-by-hop routing fails to ex- 
ploit the inherent broadcast nature often present in wireless links, leading to further 
inefficiencies. 

Second, while unicast services were once the norm, multicast services are now 
required for applications such as file distribution and video- conferencing. For multi- 
cast services, hop-by-hop routing means routing over a tree, which is difficult to do 
efficiently — finding the minimum-cost tree that spans a multicast group equates to 
solving the Steiner tree problem, which is a well-known NP-complete problem [TmilUSj . 
Moreover, if there are many receivers, many retransmitted packets may be needed, 
placing an unnecessary load on the network and possibly overwhelming the source. 
Even if the source manages, packets that are retransmitted are often useful only to a 
subset of the receivers and redundant to the remainder. 

The problems we mentioned can and generally have been resolved to some degree 
by ad hoc methods and heuristics. But that is hardly satisfactory — not only from an 
intellectual standpoint, since ad hoc solutions do little for our understanding of the 
fundamental problem, but also from a practical standpoint, since they tend to lead 
to complex, inefficient designs that are more art than science. Indeed, as Robert G. 
Gallager has commented, "much of the network field is an art [rather than a science]" 
|41j . And while it is evident that engineering real- world systems is an activity that 
will always lie between an art and a science, it is also evident that the more we base 
our designs on scientific principles, the better they will generally be. 

In this thesis, therefore, we eschew such "routed" packet networks altogether in fa- 
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vor of a new approach: we consider coded packet networks — generalizations of routed 
packet networks where the contents of outgoing packets are arbitrary, causal functions 
of the contents of received packets. In this context, we consider the same fundamental 
problem, i.e., we ask, how do we operate coded packet networks efficiently? 

We present a prescription for the operation of coded packet networks that, in 
certain scenarios (e.g., in multi-hop wireless networks), yields significant efficiency 
improvements over what is achievable in routed packet networks. We begin, in Sec- 
tion by discussing coded packet networks in more detail and by clarifying the 
position of our work, then, in Section IT!^ we describe our network model. We outline 
the body of the thesis in Section IT!^ 

1.1 Coded packet networks 

The basic notion of network coding, of performing coding operations on the contents of 
packets throughout a network, is generally attributed to Ahlswede et al. j^j. Ahlswede 
et al. never explicitly mentioned the term "packet" in j2], but their network model, 
which consists of nodes interconnected by error-free point-to-point links, implies that 
the coding they consider occurs above channel coding and, in a data network, is 
presumably applied to the contents of packets. 

Still, their work is not the first to consider coding in such a network model. Earlier 
instances of work with such a network model include those by Han and Tsitsiklis 
|lU3j . But the work of Ahlswede et al. is distinct in two ways: First, Ahlswede et 
al. consider a new problem — multicast. (The earlier work considers the problem of 
transmitting multiple, correlated sources from a number of nodes to a single node.) 
Second, and more importantly, the work of Ahlswede et al. was quickly followed by 
other work, by Li et al. [HI] and by Koetter and Medard [02] , that showed that codes 
with a simple, linear structure were sufficient to achieve capacity in the multicast 
problem. This result put structure on the codes and gave hope that practicable 
capacity-achieving codes could be found. 
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The subsequent growth in network coding was explosive. Practicable capacity- 
achieving codes were quickly proposed by Jaggi et al. [51], Ho et al. [SU], and Fragouli 
and Soljanin |1U]. Applications to network management network tomography 
[SHI iZl, overlay networks [13 IS3 EHl, and wireless networks [H [HI IHl EHl ITTT] 
were studied; capacity in random networks [SH] , undirected networks [03 IHIl , and 
Aref networks [HI] was studied; security aspects were studied [17112011231111112]; the 
extension to non-multicast problems was studied [221 |HH1 IH2l inOl IS2l IHH] ; and further 
code constructions based on convolutional codes and other notions were proposed 
[23 inn 123 1211 113 102] • Most notoriously, network coding has been adopted as a 
core technology of Microsoft's Avalanche project [12] — a research project that aims 
to develop a peer-to-peer file distribution system, which may be in competition with 
existing systems such as Bit Torrent. 

Of the various work on network coding, we draw particular attention to the code 
construction by Ho et al. [OOj- Their construction is very simple: they proposed that 
every node construct its linear code randomly and independently of all other nodes, 
and, while random linear codes were not new (the study of random linear codes 
dates as early as the work of Elias [22] in the 1950s), the application of such codes 
to the network multicast problem was. Some years earlier, Luby [OH] searched for 
codes for the transmission of packets over a lossy link and discovered random linear 
codes, constructed according to a particular distribution, with remarkable complexity 
properties. This work, combined with that of Ho et al., led to a resurgence of interest 
in random linear codes (see, e.g., [H |23 123 Ell IH3 IHO]) to the recognition of a 
powerful technique that we shall exploit extensively: random linear coding on packets. 

The work we have described has generally focused on coding and capacity — 
growing, as it has, from coding theory and information theory — and has been re- 
moved from networking theory, which generally focuses on notions such as efficiency 
and quality of service. While it is adequate, and indeed appropriate, to start in this 
way, it is clear that, with network coding being concerned with communication net- 
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works, topics under the purview of networking theory must eventually be broached. 

This thesis makes an attempt. It introduces four considerations absent from the 
original work of Ahlswede et al.: First, we consider efficiency by defining a cost for 
inefficiency. This is a standard framework in networking theory, which is used, e.g., 
in the optimal routing problem (see, e.g., [131 Sections 5.4-5.7]). Second, we consider 
the lack of synchronization in packet networks, i.e., we allow packet injections and 
receptions on separate links to occur at completely different rates with arbitrary 
degrees of correlation. Third, we consider the possibility of broadcast links, i.e., we 
allow links in the network to reach more than one node, capturing one of the key 
characteristics of wireless networks. Fourth, we consider packet loss, i.e., we allow for 
the possibility that packets are not received at the end or ends of the link into which 
they are injected. 

Some of these considerations are present in other, concurrent work. For example, 
efficiency is also considered in [23 llllj ; and the possibility of broadcast links and 
packet loss are also considered in [^ l6Uj. These papers offer alternative solutions 
to special cases of the problem that we tackle. We take all four considerations and 
give a prescription for operation that is novel and general, yet simple, useful, and 
extensible. 

1.2 Network model 

We set out, in this section, to present our network model. The intent of the model is 
to capture heterogeneous networks composed of wireline and wireless links that may 
or may not be subject to packet losses. Thus, the model captures a wide variety of 
networks, affording us a great degree of generality. 

But that is not to say that we believe that coding should be applied to all networks. 
There is a common concern about the wisdom of doing coding in packet networks since 
coding, being a more complicated operation than routing, increases the computational 
load on nodes, which are often already overtaxed in this regard. Indeed, in high-speed 
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optical networks, bottlenecks are caused almost exclusively by processing at nodes 
rather than by transmission along links IHE] • But high-speed optical networks are 
certainly not the only type of network of interest, and there are others where coding 
seems more immediately applicable. Two such types of networks are application- 
level overlay networks and multi-hop wireless networks — in both cases, having coding 
capability at nodes is feasible, and we expect bottlenecks to originate from links rather 
than nodes. 

We represent the topology of the network with a directed hypergraph Ti. = (A/", A), 
where J\f is the set of nodes and A is the set of hyperarcs. A hypergraph is a general- 
ization of a graph, where, rather than arcs, we have hyperarcs. A hyperarc is a pair 
(z, J), where i, the start node, is an element of Af and J, the set of end nodes, is a 
non-empty subset of Af. 

Each hyperarc (i, J) represents a broadcast link from node i to nodes in the non- 
empty set J. In the special case where J consists of a single element j, we have a 
point-to-point link. The hyperarc is now a simple arc and we sometimes write 
instead of The link represented by hyperarc {i, J) may be lossless or lossy, 

i.e., it may or may not be subject to packet erasures. 

To establish the desired connection or connections, packets are injected on hy- 
perarcs. Let Aij be the counting process describing the arrival of packets that are 
injected on hyperarc {i, J), and let AijK be the counting process describing the ar- 
rival of packets that are injected on hyperarc (i, J) and received by exactly the set of 
nodes K G J; i.e., for r > 0, Ajj(r) is the total number of packets that are injected 
on hyperarc {i,J) between time and time r, and v4jji^(r) is the total number of 
packets that are injected on hyperarc (z, J) and received by all nodes in K (and no 
nodes in JV\K) between time and time r. For example, suppose that three packets 
are injected on hyperarc (1,{2,3}) between time and time tq and that, of these 
three packets, one is received by node 2 only, one is lost entirely, and one is received 
by both nodes 2 and 3; then we have Ai(23)(ro) = 3, Ai(23)0(ro) = 1, ^1(23)2(7-0) = 1, 
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^1(23)3(^0) = 0, and v4i(23)(23)(^o) = 1. We have ^1(23)2(^0) = 1 not ^1(23)2(^0) = 2 
because, while two packets are received by node 2, only one is received by exactly 
node 2 and no other nodes. Similarly, we have v4i(23)3(ro) = not Ai{23)3(to) = 1 
because, while one packet is received by node 3, none are received by exactly node 3 
and no other nodes. 

We assume that Aij has an average rate Zij and that Aijx has an average rate 
ZijK] more precisely, we assume that 

hm = Zij 

T— >oo 7" 

and that 

AijKir) 
hm = ZijK 

T— >oo T 

almost surely. Hence, we have zu = ^^^-j z^k and, if the link is lossless, we have 
ZijK = for all K C J. 

The vector z, consisting of Zij, {i, J) G A, defines the rate at which packets are 
injected on all hyperarcs in the network, and we assume that it must lie within some 
constraint set Z. Thus, the pair (Ti, Z) defines a capacitated graph that represents 
the network at our disposal, which may be a full, physical network or a subnetwork 
of a physical network. The vector z, then, can be thought of as a subset of this 
capacitated graph — it is the portion actually under use — and we call it the coding 
subgraph for the desired connection or connections. For the time being, we make 
no assumptions about Z except that it is a convex subset of the positive orthant 
containing the origin. This assumption leaves room for Z to take complicated forms; 
and indeed it does, particularly when the underlying physical network is a wireless 
network, where transmissions on one link generally interfere with those on others. For 
examples of forms that Z may take in wireless networks, see P71 13^1371 16 11 ITTTl II 14j . 

We associate with the network a convex cost function / that maps feasible cod- 
ing subgraphs to real numbers and that we seek to minimize. This cost function 
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might represent, e.g., energy consumption, average latency, monetary cost, or a com- 
bination of these considerations. We assume convexity primarily for simplicity and 
tractability. Certainly, cases where / is non-convex may still be tractable, but prov- 
ing general results is difficult. We expect, at any rate, that most cost functions of 
interest will indeed be convex, and this is generally true of cost functions representing 
the considerations that we have mentioned. 

With this set-up, the objective of the efficient operation problem is to establish a 
set of desired connections at specified rates at minimum cost. This is the problem we 
address. 

As the following example will illustrate, the problem we have defined is certainly 
non-trivial. Nevertheless, its scope is limited: we consider rate, or throughput, to be 
the sole factor that is explicitly important in determining the quality of a connection, 
and we consider the rates of packet injections on hyperarcs (i.e., the coding subgraph) 
to be the sole factor that contributes to its cost. Rate is frequently the most impor- 
tant factor under consideration, but there are others. For example, memory usage, 
computational load, and delay are often also important factors. At present, we un- 
fortunately do not have a clean way to consider such factors. We discuss the issue 
further in Section ITU and Chapter El 

1.2.1 An example 

We refer to this example as the slotted Aloha relay channel, and we shall return to 
it throughout the thesis. This example serves to illustrate some of the capabilities of 
our approach, especially as they relate to the issues of broadcast and interference in 
multi-hop wireless networks. 

One of most important issues in multi-hop wireless networks is medium access, 
i.e., determining how radio nodes share the wireless medium. A simple, yet popular, 
method for medium access control is slotted Aloha (see, e.g., fSl Section 4.2]), where 
nodes with packets to send follow simple random rules to determine when they trans- 
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Figure 1.1: The slotted Aloha relay channel. 

mit. In this example, we consider a multi-hop wireless network using slotted Aloha 
for medium access control. 

We suppose that the network has the simple topology shown in Figure 11.11 and 
that, in this network, we wish to establish a single unicast connection of rate R 
from node 1 to node 3. The random rule we take for transmission is that the two 
transmitting nodes, node 1 and node 2, each transmit packets independently in a 
given time slot with some fixed probability. In coded packet networks, nodes are 
never "unbacklogged" as they are in regular, routed slotted Aloha networks — nodes 
can transmit coded packets whenever they are given the opportunity. Hence -^1(23), 
the rate of packet injection on hyperarc (1,{2,3}), is the probability that node 1 
transmits a packet in a given time slot, and likewise 2:23, the rate of packet injection 
on hyperarc (2,3), is the probability that node 2 transmits a packet in a given time 
slot. Therefore, Z = [0, 1]^, i.e., < 21(23) < 1 and < 223 < 1- 

If node 1 transmits a packet and node 2 does not, then the packet is received at 
node 2 with probability ^1(23)25 at node 3 with probability ^1(23)3, and at both nodes 
2 and 3 with probability pi(23)(23) (it is lost entirely with probability 1 — ^1(23)2 — 
Pi{23)3 — Pi{23){23))- If nodc 2 transmits a packet and node 1 does not, then the packet 
is received at node 3 with probability P233 (it is lost entirely with probability 1 —^233)- 
If both nodes 1 and 2 each transmit a packet, then the packets collide and neither of 
the packets is received successfully anywhere. 

It is possible that simultaneous transmission does not necessarily result in colli- 
sion, with one or more packets being received. This phenomenon is referred to as 
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multipacket reception capability jl2I and is decided by lower-layer implementation 
details. In this example, however, we simply assume that simultaneous transmission 
results in collision. 

Hence, we have 

^1(23)2 = 2;i(23)(l - 2:23)^1(23)2, (1-1) 
2;i(23)3 = 2;i{23)(l - ^23)^1(23)3, (1-2) 
2;i(23)(23) = 2l(23)(l " ^23)Pl(23)(23) , (1-3) 

and 

^233 = (1 - ^1(23))^23P233- (1-4) 

We suppose that our objective is to set up the desired connection while minimizing 
the total number of packet transmissions for each message packet, perhaps for the 
sake of energy conservation or conservation of the wireless medium (to allow it to be 
used for other purposes, such as other connections). Therefore 



/(^l(23), ^23) — ^1(23) + 2;23- 

The slotted Aloha relay channel is very similar to the relay channel introduced by 
van der Meulen |104j , and determining the capacity of the latter is one of the famous, 
long-standing, open problems of information theory. The slotted Aloha relay channel 
is related to the relay channel (hence its name), but different. While the relay channel 
relates to the physical layer, we are concerned with higher layers, and our problem is 
ultimately soluble. Whether our solution has any bearing on the relay channel is an 
interesting issue that remains to be explored. 

We return to the slotted Aloha relay channel in Sections 12.2.31 and 13.1.11 



1.3. THESIS OUTLINE 

1.3 Thesis outline 
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The main contribution of this thesis is to lay out, for coded packet networks conform- 
ing to our model, a solution to the efficient operation problem that we have posed, 
namely, the problem of establishing a set of desired connections at specified rates at 
minimum cost. This solution is contained in Chapters |21 and El 

Chapter |21 looks at the problem of determining what coding operation each node 
should perform given the coding subgraph. We propose using a particular random 
linear coding scheme that we show can establish a single multicast connection at 
rates arbitrarily close to its capacity in a given coding subgraph. This means that, at 
least for establishing a single multicast connection, there is no loss of optimality in 
using this coding scheme and determining the coding subgraph independently. The 
optimality to which we refer is with respect to the efficient operation problem that we 
have defined, which, as we have mentioned, does not explicitly consider factors such 
as memory usage, computational load, and delay. In Section 1231 we include memory 
usage as a factor under explicit consideration. We modify the coding scheme to 
reduce the memory usage of intermediate nodes and assess, by analysis and computer 
simulation, the effect of this modification on various performance factors. 

Chapter (21 on the other hand, looks at the problem of determining the coding 
subgraph. We argue that, even when we wish to establish multiple connections, it 
suffices, in many instances, simply to use the coding scheme described in Chapter |21 
and to determine the coding subgraph independently. Thus, this problem, of deter- 
mining the coding subgraph, can be written as a mathematical programming problem, 
and, under particular assumptions, we find distributed algorithms for performing the 
optimization. We believe that these algorithms may eventually form the basis for 
protocols used in practice. 

In Chapter|31 we evaluate, by computer simulation, the performance of the solution 
we laid out and compare it to the performance of existing techniques for routed packet 
networks. We find that our solution has the potential to offer significant efficiency 
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improvements, particularly for multi-hop wireless networks. For some readers, this 
chapter may be the one to read first. It can be understood more or less independently 
of Chapters 121 and El and is, in a sense, "the bottom line" — at least in so far as we 
have managed to elucidate it. The interested reader may then proceed to Chapters |21 
and 121 to understand the solution we propose. 

Our conclusion, in Chapter gives a final perspective on our work and discusses 
the road ahead. 



Chapter 2 



Network Coding 



HIS chapter deals with what we call the network coding part of the efficient 



_L operation problem. We assume that the coding subgraph z is given, and we 
set out to determine what coding operation each node should perform. We propose 
using a particular random linear coding scheme that we show can establish a single 
multicast connection at rates arbitrarily close to its capacity in z. More precisely, for 
a given coding subgraph 2, which gives rise to a particular set of rates {zijx} at which 
packets are received, the coding scheme we study achieves (within an arbitrarily small 
factor) the maximum possible throughput when run for a sufficiently long period of 
time. Exactly how the injection rates defined by z relates to the reception rates {zijx} 
and how the losses, which establishes this relationship, are caused is immaterial for 
our result — thus, losses may be due to collisions, link outage, buffer overflow, or any 
other process that gives rise to losses. The only condition that we require the losses 
to satisfy is that they give rise to packet receptions where the average rates {zijx} 
exist, as our network model specifies (see Section IT!^ . 

As a consequence of the result, in establishing a single multicast connection in 
a network, there is no loss of optimality in the efficient operation problem from 
separating subgraph selection and network coding. We deal with subgraph selection 
in Chapter El 
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We begin, in Section \'2.1\ by precisely specifying the coding scheme we consider 
then, in Section 1?!^ we give our main result: that this scheme can establish a single 
multicast connection at rates arbitrarily close to its capacity in z. In Section 1231 we 
strengthen these results in the special case of Poisson traffic with i.i.d. losses by giving 
error exponents. These error exponents allow us to quantify the rate of decay of the 
probability of error with coding delay and to determine the parameters of importance 
in this decay. 

In both these sections, we consider rate, or throughput, of the desired connection 
to be the sole factor of explicit importance. In Section 1231 we include memory usage 
as a factor of explicit importance. We modify the coding scheme to reduce the memory 
usage of intermediate nodes, and we study the effect of this modification. 

2.1 Coding scheme 

The specific coding scheme we consider is as follows. We suppose that, at the source 
node, we have K message packets Wi,W2, ■ ■ ■ , wk, which are vectors of length A over 
some finite field ¥q. (If the packet length is b bits, then we take A = \b/ log2 q] .) The 
message packets are initially present in the memory of the source node. 

The coding operation performed by each node is simple to describe and is the same 
for every node: received packets are stored into the node's memory, and packets are 
formed for injection with random linear combinations of its memory contents when- 
ever a packet injection occurs on an outgoing link. The coefficients of the combination 
are drawn uniformly from ¥g. 

Since all coding is linear, we can write any packet u in the network as a linear 
combination of Wi, W2, ■ ■ ■ , wk, namely, u = J2k=i IkWk- We call 7 the global encoding 
vector of u, and we assume that it is sent along with u, as side information in its 
header. The overhead this incurs (namely, K log2 q bits) is negligible if packets are 
sufficiently large. 

Nodes are assumed to have unlimited memory. The scheme can be modified so 
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that received packets are stored into memory only if their global encoding vectors 
are linearly-independent of those already stored. This modification keeps our results 
unchanged while ensuring that nodes never need to store more than K packets. The 
case where nodes can only store fewer than K packets is discussed in Section 12.41 

A sink node collects packets and, if it has K packets with linearly-independent 
global encoding vectors, it is able to recover the message packets. Decoding can be 
done by Gaussian elimination. The scheme can be run either for a predetermined 
duration or, in the case of rateless operation, until successful decoding at the sink 
nodes. We summarize the scheme in Figure ITTl 

The scheme is carried out for a single block of K message packets at the source. 
If the source has more packets to send, then the scheme is repeated with all nodes 
flushed of their memory contents. 

Related random linear coding schemes are described in [23 1^0] for the application 
of multicast over lossless wireline packet networks, in [20] for data dissemination, and 
in PP for data storage. Other coding schemes for lossy packet networks are described 
in and [^0]; the scheme described in the former requires placing in the packet 
headers side information that grows with the size of the network, while that described 
in the latter requires no side information at all, but achieves lower rates in general. 
Both of these coding schemes, moreover, operate in a block-by-block manner, where 
coded packets are sent by intermediate nodes only after decoding a block of received 
packets — a strategy that generally incurs more delay than the scheme we describe, 
where intermediate nodes perform additional coding yet do not decode |85j . 

2.2 Coding theorems 

In this section, we specify achievable rate intervals for the coding scheme in various 
scenarios. The fact that the intervals we specify are the largest possible (i.e., that 
the scheme is capacity-achieving) can be seen by simply noting that the rate of a 
connection must be limited by the rate at which distinct packets are being received 
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Initialization: 

• The source node stores the message packets Wi,W2,...,wk in its 
memory. 

Operation: 

• When a packet is received by a node, 

— the node stores the packet in its memory. 

• When a packet injection occurs on an outgoing link of a node, 

— the node forms the packet from a random linear combination 
of the packets in its memory. Suppose the node has L packets 
Ml, M2, • • • , Ml in its memory. Then the packet formed is 

L 

Uo := aiUi, 
1=1 

where is chosen according to a uniform distribution over the 
elements of F^. The packet's global encoding vector 7, which 
satisfies uq = Xl^i Ik^k, is placed in its header. 

Decoding: 

• Each sink node performs Gaussian elimination on the set of global 
encoding vectors from the packets in its memory. If it is able to find an 
inverse, it applies the inverse to the packets to obtain wi,W2, ■ ■ ■ , wk', 
otherwise, a decoding error occurs. 



Figure 2.1: Summary of the random linear coding scheme we consider. 
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Figure 2.2: A network consisting of two point-to-point links in tandem. 

over any cut between the source and the sink. A formal converse can be obtained 
using the cut-set bound for multi-terminal networks (see [211 Section 14.10]). 

2.2.1 Unicast connections 

We develop our general result for unicast connections by extending from some special 
cases. We begin with the simplest non-trivial case: that of two point-to-point links 
in tandem (see Figure . 

Suppose we wish to establish a connection of rate arbitrarily close to R packets 
per unit time from node 1 to node 3. Suppose further that the coding scheme is run 
for a total time A, from time until time A, and that, in this time, a total of 
packets is received by node 2. We call these packets Vi,V2, . . . , v^- 

Any packet u received by a node is a linear combination of f i, f2, . . . , wtv, so we 



Now, since Vn is formed by a random linear combination of the message packets 
Wi, W2, . . . , wk, we have 



can write 



N 




n=l 



K 



•n 




k=l 



for n = 1, 2, . . . , A^. Hence 




18 



CHAPTER 2. NETWORK CODING 



and it follows that the kth component of the global encoding vector of u is given by 

N 

Ik = y^/3n«nfc- 

n=l 

We call the vector j3 associated with u the auxiliary encoding vector of u, and we see 
that any node that receives [-^'(l + e)\ or more packets with linearly-independent 
auxiliary encoding vectors has [-^'(l + £)\ packets whose global encoding vectors 
collectively form a random IK{1 + e)\ x K matrix over Fg, with all entries chosen 
uniformly. If this matrix has rank K, then node 3 is able to recover the message 
packets. The probability that a random \_K{1 + £:)J x K matrix has rank K is, by a 
simple counting argument, ni=i+[K'{i+£)j-ii:(-'- ~ V?^)) which can be made arbitrarily 
close to 1 by taking K arbitrarily large. Therefore, to determine whether node 3 can 
recover the message packets, we essentially need only to determine whether it receives 
\_K{1 + e)\ or more packets with linearly-independent auxiliary encoding vectors. 

Our proof is based on tracking the propagation of what we call innovative pack- 
ets. Such packets are innovative in the sense that they carry new, as yet unknown, 
information about Vi, t'2, . . . , Wat to a node. It turns out that the propagation of inno- 
vative packets through a network follows the propagation of jobs through a queueing 
network, for which fluid flow models give good approximations. We present the fol- 
lowing argument in terms of this fluid analogy and defer the formal argument to 
Appendix 12. A. II at the end of this chapter. 

Since the packets being received by node 2 are the packets Vi,V2, . . . ,vi\[ them- 
selves, it is clear that every packet being received by node 2 is innovative. Thus, 
innovative packets arrive at node 2 at a rate of 2:122, and this can be approximated by 
fluid flowing in at rate 2122- These innovative packets are stored in node 2's memory, 
so the fluid that flows in is stored in a reservoir. 

Packets, now, are being received by node 3 at a rate of ^233, but whether these 
packets are innovative depends on the contents of node 2's memory. If node 2 has more 
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Figure 2.3: Fluid flow system corresponding to two-link tandem network. 

information about vi,V2, . . . ,vn than node 3 does, then it is highly likely that new 
information will be described to node 3 in the next packet that it receives. Otherwise, 
if node 2 and node 3 have the same degree of information about vi,V2, ■ . . , vn, then 
packets received by node 3 cannot possibly be innovative. Thus, the situation is as 
though fluid flows into node 3's reservoir at a rate of Z233, but the level of node 3's 
reservoir is restricted from ever exceeding that of node 2's reservoir. The level of 
node 3's reservoir, which is ultimately what we are concerned with, can equivalently 
be determined by fluid flowing out of node 2's reservoir at rate ^233 • 

We therefore see that the two-link tandem network in Figure 17!^ maps to the fluid 
flow system shown in Figure EISl It is clear that, in this system, fluid flows into node 
3 s reservoir at rate 

min(2;i22, ^233)- This rate determines the rate at which pack- 
ets with new information about vi,V2,... ,vi^ — and, therefore, linearly-independent 
auxiliary encoding vectors — arrive at node 3. Hence the time required for node 3 to 
receive \_K{1 + e)\ packets with linearly-independent auxiliary encoding vectors is, 
for large K, approximately K{1 +e)/ min(2;i22, -^233)? which implies that a connection 
of rate arbitrarily close to R packets per unit time can be established provided that 

i? < min (2:122, 2:233) • (2.1) 

The right-hand side of (j2.H) is indeed the capacity of the two-link tandem network, 
and we therefore have the desired result for this case. 

We extend our result to another special case before considering general unicast 
connections: we consider the case of a tandem network consisting of L point-to-point 
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Figure 2.4: A network consisting of L point-to-point links in tandem. 
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Figure 2.5: Fluid flow system corresponding to L-link tandem network. 

links and L + 1 nodes (see Figure . 

This case is a straightforward extension of that of the two-link tandem network. It 
maps to the fluid flow system shown in Figure EUl In this system, it is clear that fluid 
flows into node {L + l)'s reservoir cit rcite mini<j<i{2;j(j+i)(j+i)}. Hence a connection 
of rate arbitrarily close to R packets per unit time from node 1 to node L + 1 can be 
established provided that 



Since the right-hand side of ()2.2j) is indeed the capacity of the L-link tandem net- 
work, we therefore have the desired result for this case. A formal argument is in 
Appendix I2.A.2I 

We now extend our result to general unicast connections. The strategy here is 
simple: A general unicast connection can be formulated as a flow, which can be 
decomposed into a finite number of paths. Each of these paths is a tandem network, 
which is the case that we have just considered. 

Suppose that we wish to establish a connection of rate arbitrarily close to R 
packets per unit time from source node s to sink node t. Suppose further that 



R < min 



(2.2) 
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where Q{s,t) is the set of all cuts between s and t, and Tj^^^Q) denotes the set of 
forward hyperarcs of the cut Q, i.e., 

T+iQ):={it,J)eA\ieQ,J\Q^ 0}. 

Therefore, by the max-flow/min-cut theorem (see, e.g., |31 Sections 6.5-6.7], fOl 
Section 3.1]), there exists a flow vector x satisfying 



{J\(i,J)eA} jeJ {j\ij,l)eA,i€l} 



f 

R if i = s, 

-R \i% = t, 

otherwise. 



for all i e Af, 

^Xijj < ^iJL (2.3) 

jeir {LcJ\LnK^0} 

for all (z, J) ^ A and K G J, and Xjjj > for all (z, J) E A and j G J. 

Using the conformal realization theorem (see, e.g., ^Hl Section 1.1]), we decompose 

X into a finite set of paths {pi,P2, ■ ■ ■ ,Pm}, each carrying positive fiow R^ for m = 

1,2,..., M, such that J2m=i = R- We treat each path tandem network 

and use it to deliver innovative packets at rate arbitrarily close to Rm, resulting in 

an overall rate for innovative packets arriving at node t that is arbitrarily close to R. 

Some care must be take in the interpretation of the fiow and its path decomposition 

because the same packet may be received by more than one node. The details of the 

interpretation are in Appendix 12. A. 31 

2.2.2 Multicast connections 

The result for multicast connections is, in fact, a straightforward extension of that 
for unicast connections. In this case, rather than a single sink t, we have a set of 
sinks T. As in the framework of static broadcasting (see jHH EH]), we allow sink 
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nodes to operate at different rates. We suppose that sink t E T wishes to achieve rate 
arbitrarily close to Rt, i.e., to recover the K message packets, sink t wishes to wait 
for a time At that is only marginally greater than K/Rt. We further suppose that 

Rt < min < ZijK 

for all t E T. Therefore, by the max-fiow/min-cut theorem, there exists, for each 
t E T, a flow vector x^^^ satisfying 



{Mi,J)eA} jeJ {j\ijJ)eA,iel} 



R if i = s, 
-R if z = t, 
otherwise. 



for all i e Af, 



j&K {LcJlLnK^I}} 



for all (z, J) E A and K C J, and xfjj > for all (i, J) E A and j G J. 

For each flow vector x^^\ we go through the same argument as that for a unicast 
connection, and we find that the probability of error at every sink node can be made 
arbitrarily small by taking K sufficiently large. 

We summarize our results with the following theorem statement. 

Theorem 2.1. Consider the coding subgraph z. The random linear coding scheme 
described in Section Wl\ is capacity-achieving for multicast connections in z, i.e., for 
K sufficiently large, it can achieve, with arbitrarily small error probability, a multicast 
connection from source node s to sink nodes in the set T at rate arbitrarily close to 
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Rt packets per unit time for each t G T if 

Rt < mill <J ZijK > 

for allt G T.i 

Remark. The capacity region is determined solely by the average rates {zijx} at 
which packets are received. Thus, the packet injection and loss processes, which give 
rise to the packet reception processes, can in fact take any distribution, exhibiting 
arbitrary correlations, as long as these average rates exist. 

2.2.3 An example 

We return to the slotted Aloha relay channel described in Section ri.2.1[ Theorem 12. II 
implies that the random linear coding scheme we consider can achieve the desired 
unicast connection at rates arbitrarily close to R packets per unit time if 

R < min(zi(23)2 + ^1(23)3 + ^1(23)(23), ^1(23)3 + ^1(23)(23) + ^233)- 

Substituting (frT |) -(fr H) . we obtain 

R < min(2i(23)(l - 2;23)(Pl{23)2 + Pl(23)3 + Pl(23)(23) ) , 

2;i(23)(l - 2;23)(Pl(23)3 +Pl(23)(23)) + (1 " (23)) 2^23^233 )• 

We see that the range of achievable rates is specified completely in terms of the 
parameters we control, 2;i{23) and 2:23, and the given parameters of the problem, pi(23)2, 
Pi{23)3 5 Pi(23)(23), and P233. It remains only to choose 2:1(23) and 223- This, we deal with 
in the next chapter. 

^In earlier versions of this work [701 176| . we required the field size q of the coding scheme to 
approach infinity for Theorem l2.1l to hold. This requirement is in fact not necessary, and the formal 
arguments in Appendix 12. Al do not require it. 



24 CHAPTER 2. NETWORK CODING 

2.3 Error exponents for Poisson traffic with i.i.d. 
losses 



We now look at the rate of decay of the probabihty of error pe in the coding delay A. 
In contrast to traditional error exponents where coding delay is measured in symbols, 
we measure coding delay in time units — time r = A is the time at which the sink 
nodes attempt to decode the message packets. The two methods of measuring delay 
are essentially equivalent when packets arrive in regular, deterministic intervals. 

We specialize to the case of Poisson traffic with i.i.d. losses. Thus, the process 
AijK is a Poisson process with rate Zijx- Consider the unicast case for now, and 
suppose we wish to establish a connection of rate R. Let C be the supremum of all 
asymptotically-achievable rates. 

We begin by deriving an upper bound on the probability of error. To this end, 
we take a flow vector x from s to t of size C and, following the development in 
Appendix 12. A| develop a queueing network from it that describes the propagation 
of innovative packets for a given innovation order fi. This queueing network now 
becomes a Jackson network. Moreover, as a consequence of Burke's theorem (see, 
e.g., [Sni Section 2.1]) and the fact that the queueing network is acyclic, the arrival 
and departure processes at all stations are Poisson in steady-state. 

Let \i/f(m) be the arrival time of the mth innovative packet at t, and let C := 
(1 — q~^)C. When the queueing network is in steady-state, the arrival of innovative 
packets at t is described by a Poisson process of rate C . Hence we have 

1 C 
lim — logE[exp(e^t(m))] = log— — - (2.4) 

m^oo m C — o 

for 6 < C IH7] . If an error occurs, then fewer than [-RA] innovative packets 
are received by t by time r = A, which is equivalent to saying that \E't([i?A]) > A. 
Therefore, 

Pe<Pr(^t(ri?Al) > A), 
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and, using the Chernoff bound, we obtain 



Pe < ^mm ^exp {-9 A + logE[exp(^^t([i?A]))]) . 

Let e be a positive real number. Then using equation ()2.4j) we obtain, for A sufficiently 
large. 



Pe < min exp ( —9A + RA < log — 

o<e<c' \ [ C 



C 



= exp(-A(C" -R- R\og{C'/R)) + RAe). 
Hence, we conclude that 

lim >c' -R- RlogiC'/R). (2.5) 

A^oo A 

For the lower bound, we examine a cut whose flow capacity is C . We take one 
such cut and denote it by Q* . It is clear that, if fewer than \RA~\ distinct packets 
are received across Q* in time r = A, then an error occurs. The arrival of distinct 
packets across Q* is described by a Poisson process of rate C . Thus we have 

p.>exp(-CA) Y: ^ 

and, using Stirling's formula, we obtain 

lim Z]^^ <C-R- R\og{C/R). (2.6) 

A— »oo A 

Since (|2.5p holds for all positive integers /x, we conclude from ()2.5|) and ()2.6|) that 
lim Z}S^ = C-R- R\og{C/R). (2.7) 

A— >oo A 
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Equation ()2.7j) defines the asymptotic rate of decay of the probabihty of error 
in the coding delay A. This asymptotic rate of decay is determined entirely by R 
and C. Thus, for a packet network with Poisson traffic and i.i.d. losses employing 
the coding scheme described in Section 12.11 the flow capacity C of the minimum cut 
of the network is essentially the sole figure of merit of importance in determining 
the effectiveness of the coding scheme for large, but finite, coding delay. Hence, in 
deciding how to inject packets to support the desired connection, a sensible approach 
is to reduce our attention to this figure of merit, which is indeed the approach that 
we take in Chapter El 

Extending the result from unicast connections to multicast connections is straight- 
forward — we simply obtain (j2.7|) for each sink. 

2.4 Finite-memory random linear coding 

The results that we have thus far established about the coding scheme described 
in Section ITT] show that, from the perspective of conveying the most information in 
each packet transmission, it does very well. But packet transmissions are not the only 
resource with which we are concerned. Other resources that may be scarce include 
memory and computation and, if these resources are as important or more important 
than packet transmissions, then a natural question is whether we can modify the 
coding scheme of Section 12.11 to reduce its memory and computation requirements, 
possibly in exchange for more transmissions. 

In this section, we study a simple modification. We take the coding scheme of 
Section 12.11 and we assume that intermediate nodes (i.e., nodes that are neither 
source nor sink nodes) have memories capable only of storing a fixed, finite number 
of packets, irrespective of K. An intermediate node with a memory capable of storing 
M packets uses its memory in one of two ways: 

1. as a shift register: arriving packets are stored in memory and, if the memory is 
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already full, the oldest packet in the memory is discarded; or 

2. as an accumulator: arriving packets are multiplied by a random vector chosen 
uniformly over F^^, and the product is added to the M memory slots. 

We first consider, in Section [2.4.11 the case of a single intermediate node in iso- 
lation. In this case, the intermediate node encodes packets and its immediate down- 
stream node decodes them. Such a scheme offers an attractive alternative to compa- 
rable reliability schemes for a single link, such as automatic repeat request (arq) or 
convolutional coding (see, e.g., |Slini)- In Section r2. 4. 2| we consider a network, specif- 
ically, the two-link tandem network (see Figure IT^ . We see that, while limiting the 
memory of intermediate nodes certainly results in loss of achievable rate, the relative 
rate loss, at least for the two-link tandem network, can be quantified, and it decays 
exponentially in M. 

2.4.1 Use in isolation 

When used in isolation at a single intermediate node, the encoder takes an incoming 
stream of message packets, Ui,U2, . . ., and forms a coded stream of packets that is 
placed on its lossy outgoing link and decoded on reception. We assume that the 
decoder knows, for each received packet, the linear transformation that has been 
performed on the message packets to yield that packet. This information can be 
communicated to the decoder by a variety of means, which include placing it into 
the header of each packet as described in Section 12.11 (which is certainly viable when 
the memory is used as a shift register — the overhead is M logg q bits plus that of a 
sequence number), and initializing the random number generators at the encoder and 
decoder with the same seed. 

The task of decoding, then, equates to matrix inversion in Fg, which can be done 
straightforwardly by applying Gaussian elimination to each packet as it is received. 
This procedure produces an approximately-steady stream of decoded packets with an 
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Figure 2.6: Markov chain modeling the evolution of the difference between the number 
of packets received by the encoder and the number of packets transmitted and not 
lost. 

expected delay that is constant in the length of the input stream. Moreover, if the 
memory is used as a shift register, then the complexity of this decoding procedure 
is also constant with the length of the input stream and, on average, is O(M^) per 
packet. 

We discretize the time axis into epochs that correspond to the transmission of 
an outgoing packet. Thus, in each epoch, an outgoing packet is transmitted, which 
may be lost, and one or more incoming packets are received. If transmission is to be 
reliable, then the average number of incoming packets received in each epoch must 
be at most one. 

We make the following assumptions on incoming packet arrivals and outgoing 
packet losses, with the understanding that generalizations are certainly possible. We 
assume that, in an epoch, a single packet arrives independently with probability r and 
no packets arrive otherwise, and the transmitted outgoing packet is lost independently 
with probability e and is received otherwise. This model is appropriate when losses 
and arrivals are steady — and not bursty. 

We conduct our analysis in the limit of g — oo, i.e., the limit of infinite field 
size. We later discuss how the analysis may be adapted for finite q, and quantify by 
simulation the difference between the performance in the case of finite q and that of 
infinite q in some particular instances. 

We begin by considering the difference between the number of packets received by 
the encoder and the number of packets transmitted and not lost. This quantity, we 
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see, evolves according to the infinite-state Markov chain shown in Figure 12.61 where 
a = re, f3 = {1 — r){l — e), and 7 = r(l — e) + (1 — r)e. 

At the first epoch, the memory of the encoder is empty and we are in state 0. We 
continue to remain in state in subsequent epochs until the first packet ui arrives. 
Consider the first outgoing packet after the arrival of ui. This packet is either lost 
or not. Let us first suppose that it is not lost. Thus, we remain in state 0, and the 
decoder receives a packet that is a random linear combination of ui, i.e., a random 
scalar multiple of ui, and, since q is infinitely large by assumption, this scalar multiple 
is non-zero with probability 1; so the decoder can recover ui from the packet that it 
receives. 

Now suppose instead that the first outgoing packet after the arrival of ui is lost. 
Thus, we move to state 1. If an outgoing packet is transmitted and not lost before the 
next packet arrives, the decoder again receives a random scalar multiple of ui and we 
return to state 0. So suppose we are in state 1 and U2 arrives. Then, the next outgoing 
packet is a random linear combination of ui and U2. Suppose further that this packet 
is received by the decoder, so we are again in state 1. This packet, currently, is more 
or less useless to the decoder; it represents a mixture between ui and U2 and does not 
allow us to determine either. Nevertheless, it gives, with probability 1, the decoder 
some information that it did not previously know, namely, that ui and U2 lie in a 
particular linear subspace of F^. As in Section we call such an informative packet 
innovative. 

Any subsequent packet received by the decoder is also innovative with probability 
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1. In particular, if the decoder receives a packet before the arrival of another packet 
U3 at the encoder, returning us to state 0, then the decoder is able to recover both 
ui and U2. More generally, what we see is that, provided that packets arrive only in 
states 0, 1, . . . , M — 1, the decoder is able to recover, at every return to state 0, the 
packets that arrived between the current and the previous return. If a packet arrives 
in state M, however, loss occurs. Information in the encoder's memory is overwritten 
or corrupted, and will never be recovered. The current contents of the encoder's 
memory, however, can still be recovered and, from the point of view of recovering 
these contents, the coding system behaves as though we were in state M. Hence, to 
analyze the performance of the coding scheme, we modify the Markov chain shown 
in Figure ESI to that in Figure ITTI Let Xt be the state of this Markov chain at time t. 
We can interpret Xt as the number of innovative packets the encoder has for sending 
at time t. 

We now proceed to derive some quantities that are useful for designing the pa- 
rameters of the coding scheme. We begin with the steady-state probabilities vTj := 
lim^^oo Pr(xf = i). Since {xt} is a birth-death process, its steady-state probabilities 
are readily obtained. We obtain 

g'jl - g) 

= 1 M ^2.8) 

1 — (Tf)™ 



for z = 0, 1,...,M- 1, and 



ttm = , (2.9) 

1 — ag^^^ 



where g := a/ (5 = re/{l —r){l — e) and a := r/(l — e). We assume g < 1, which is 
equivalent to r < 1 — e, for, if not, the capacity of the outgoing link is exceeded, and 
we cannot hope for the coding scheme to be effective. 

We now derive the probability of packet loss, pi. Evaluating pi is not straightfor- 
ward because, since coded packets depend on each other, the loss of a packet owing 
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to the encoder exceeding its memory is usually accompanied by other packet losses. 
We derive an upper bound on the probability of loss. 

A packet is successfully recovered by the decoder if the ensuing path taken in the 
Markov chain in Figure 12.71 returns to state without a packet arrival occurring in 
state M. Let qi be the probability that a path, originating in state i, reaches state 
without a packet arrival occurring in state M. Our problem is very similar to a 
random walk, or ruin, problem (see, e.g., jHZl Chapter XIV]). We obtain 



Qi 



1 — ag 



M-i 



1 — ag 



M 



for z = 0, 1,...,M. 

Now, after the coding scheme has been running for some time, a random arriving 
packet finds the scheme in state i with probability tTj and, with probability 1 — e, the 
scheme returns to state i after the next packet transmission or, with probability e, it 
moves to state i + 1. Hence 

M-l 



1-Pi> "^{{1 - €)qi + eqi+i}7ii 



i=0 

V-'/n- ^i^^^^ l-ag'^~^~' \ g^il-g) 

1=0 ^ 



l~ag^' ■ ' 1-a^^^ j 1-ag^ 



- (1 - e)Mag'"' - eMag^ 



1-ag^^y [ 1-g 

^ {1 - g^' - (1 - 2e)Mag^' - eMag^'"' + (1 - e)Mag^'+^}, 



(1 - ag^^y 
from which we obtain 

Pi < ^ ^{sMa + (l-2a + Ma - 2eMa)g- il- e)Mag^ + a"^ g^^^^}. (2.10) 

(1 — ag^^^y 

We have thus far looked at the limit of g ^ oo, while, in reality, q must be finite. 
There are two effects of having finite q: The first is that, while the encoder may have 
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innovative information to send to the decoder (i.e., Xt > 0), it fails to do so because 

the hnear combination it chooses is not hnearly independent of the combinations 

already received by the decoder. For analysis, we can consider such non-innovative 

packets to be equivalent to erasures, and we find that the effective erasure rate is 

e{l — q~^^). The Markov chain in Figure l?!7l can certainly be modified to account for 

this effective erasure rate, but doing so makes analysis much more tedious. 

The second of the effects is that, when a new packet arrives, it may not increase 

the level of innovation at the encoder. When the memory is used as a shift register, 

this event arises because a packet is overwritten before it has participated as a linear 

factor in any successfully received packets, i.e., all successfully received packets have 

had a coefficient of zero for that packet. When the memory is used as an accumulator, 

this event arises because the random vector chosen to multiply the new packet is such 

that the level of innovation remains constant. The event of the level of innovation 

not being increased by a new packet can be quite disastrous, because it is effectively 

equivalent to the encoder exceeding its memory. Fortunately, the event seems rare; 

in the accumulator case, we can quantify the probability of the event exactly as 
I — q^t-M 

To examine the effect of finite q, we chose e = 0.1 and simulated the performance 
of the coding scheme for 200,000 packets with various choices of the parameters r, q, 
and M (see Figures f2 . 8H2 . llj) . We decoded using Gaussian elimination on packets as 
they were received and used the encoder's memory as a shift register to keep decoding 
complexity constant with the length of the packet stream. Delay was evaluated as 
the number of epochs between a packet's arrival at the encoder and it being decoded, 
neglecting transmission delay. As expected, we see that average loss rate decreases 
and average delay increases with increasing M; a larger memory results, in a sense, in 
more coding, which gives robustness at the expense of delay. Moreover, we see that 
a field size q > 2^ (perhaps even q > 2^) is adequate for attaining loss rates close to 
the upper bound for infinite field size. 
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Figure 2.8: Average loss rate for 200,000 packets as a function of memory size M with 
r = 0.8, e = 0.1, and various coding field sizes q. The upper bound on the probability 
of loss for g — > oo is also drawn. 
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Figure 2.9: Average delay for 200,000 packets as a function of memory size M with 
r = 0.8, e = 0.1, and various coding field sizes q. 



2.4. FINITE-MEMORY RANDOM LINEAR CODING 



35 




Figure 2.10: Average loss rate for 200,000 packets as a function of memory size M 
with r = 0.6, e = 0.1, and various coding field sizes q. The upper bound on the 
probability of loss for g ^ oo is also drawn. 
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Figure 2.11: Average delay for 200,000 packets as a function of memory size M with 
r = 0.6, e = 0.1, and various coding field sizes q. 
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2.4.2 Use in a two-link tandem network 

When finite-memory random linear coding is used in isolation, packets are sometimes 
lost because the decoder receives linear combinations that, although innovative, are 
not decodable. For example, suppose the decoder receives ui + U2, but is neither able 
to recover Ui nor U2 from other packets. This packet, Ui + U2, definitely gives the 
decoder some information, but, without either ui or U2, the packet must be discarded. 
This would not be the case, however, if ui and U2 were themselves coded packets — a 
trivial example, assuming that we are not coding over F2, is if Ui = U2 = Wi, where 
Wi is a message packet for an outer code. 

In this section, we consider finite-memory random linear coding in the context of 
a larger coded packet network. We consider the simplest set-up with an intermediate 
node: a two-link tandem network (see Figure where we wish to establish a unicast 
connection from node 1 to node 3. Node 1 and node 3 use the coding scheme described 
in Section without modification, while node 2 has only M < K memory elements 
and uses the modified scheme. This simple two-link tandem network serves as a basis 
for longer tandem networks and more general network topologies. 

We again discretize the time axis. We assume that, at each epoch, packets are 
injected by both nodes 1 and 2 and they are lost independently with probability 6 and 
e, respectively. Although situations of interest may not have transmissions that are 
synchronized in this way, the synchronicity assumption can be relaxed to an extent 
by accounting for differences in the packet injection rates using the loss rates. 

We again conduct our analysis in the limit of infinite field size. The considerations 
for finite field size are the same as those mentioned in Section 12.4.11 

Let Xt denote the number of innovative packets (relative to ui, ^2, . . . , uat) node 2 
has for sending at time t, and let yt denote the number of innovative packets received 
by node 3 at time t. By the arguments of Section l!^.4.H the following principles govern 
the evolution of Xt and yt over time: 

• As long as Xt < M, i.e., the memory does not already have M innovative 
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M-1 



M 



1 




Figure 2.12: Markov chain modeling the evolution of Xt and yt. To simplify the 
diagram, we do not show self-transitions. 

packets, node 2 increases the innovation contents of its memory by 1 upon 
successful reception of a packet over arc (1,2). 

• As long as Xt > 0, i.e., the memory is not completely redundant, the output of 
2 is innovative, so yt will increase by 1 provided that transmission over (2, 3) is 
successful. 

Let a := (1 — 6)e, /3 := 5(1 — e), and ( := {1 — 6){1 — e). Then the evolution of Xt 
and yt is modeled by the Markov chain shown in Figure I2.12| where the horizontal 
coordinate of a state indicates Xt, and the vertical coordinate corresponds to the 
variable yt. 

We see that {xt} evolves as in Section [2.4.11 so its steady-state probabilities are 
given by (|2.8p and (j2.9p with r = 1 — 6. Hence, once the system is sufficiently mixed, 
the probability that yt increases at time t is given by 



CvTo + (1 - e)7ri + ■ ■ ■ + (1 - e)7iM = (1 - e){l - Stiq) 



(l-5)(l-7rA,). 
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Therefore the system can operate at rate 



R={1-S){1-71m) 



with high probabihty of success. 

Suppose, without loss of generahty, that 6 > e, so g < 1. Let R* be the min-cut 
capacity, or maximum rate, of the system, which, in this case, is 1 — 5. Then the 
relative rate loss with respect to the min-cut rate is 



As discussed before, our analysis assumes forming linear combinations over an 
infinitely large field, resulting in a Markov chain model with transition probabilities 
given in Figure I2.121 If on the other hand the field size is finite, we can still find new 
expressions for the transition probabilities, although the complete analysis becomes 
very complex. In particular, assume that the memory is used as an accumulator, so 
that the contents of the memory at each time are uniformly random linear combi- 
nations, over Fg, of the received packets at node 2 by that time. Then, as we have 
mentioned, if the innovation content of the memory is x and a new packet arrives at 
node 2, the probability that node 2 can increase the innovation of its memory by 1 is 
(1 — g^"^"^), independently from all other past events. Similarly, the probability that 
the output of node 2 is innovative is (1 — q~^). 

To quantify the effect of operations over a finite field, we simulated the evolution 
of this Markov chain for two combinations of 6 and e values that were also considered 
in Section l!^.4.1l fsee Figures r2.13l and I2.14|) . The effective rate is considered to be 
Re '■= Vn/N, where is the number of packet transmissions at A, and as before, 
is the number of innovative packets received at node 3 by time A^. We simulated this 
process for A^ = 10^ packets. For different field sizes, we plot the relative rate loss 
with respect to the min-cut rate — i.e., 1 — Re/ R* — as a function of the memory size. 




(2.11) 
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Figure 2.13: Relative rate loss with respect to min-cut rate as a function of memory 
size M for 6 = 0.2, e = 0.1, and various coding field sizes q. 
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Figure 2.14: Relative rate loss with respect to min-cut rate as a function of memory 
size M for 6 = 0.4, e = 0.1, and various coding field sizes q. 
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Also plotted is the theoretical result from ()2.1H] . 

By comparing Figures 12.131 and 12.141 with Figures 12.81 and 12.101 respectively, we 
see the advantage that comes from explicitly recognizing that the coding takes place 
in the context of a larger coded packet network. The loss rate in the latter plots 
essentially equates to the factor 1 — Rg/R* in the former. Thus, in the limit of infinite 
q, we compare the probability of loss pi upper bounded by equation ()2.10|) and the 
expression for 1 — R/R* given by equation p.llj) . We note that, in both cases, the 
decay as M — > oo is as g^^ . Moreover, it follows from our discussion that 1 — R/R* 
must be a lower bound for pi, hence pi itself decays as g'^^ as M ^ oo. 

2. A Appendix: Formal arguments for main result 

In this appendix, we given formal arguments for Theorem 12.11 Sections 12. A. 11 12.A.21 
and I2.A.3I give formal arguments for three special cases of Theorem 12.11 the two- 
link tandem network, the L-link tandem network, and general unicast connections, 
respectively. 

2.A.1 Two- link tandem network 

All packets received by node 2, namely vi,V2, . . . ,vn, are considered innovative. We 
associate with node 2 the set of vectors U, which varies with time and is initially 
empty, i.e., f/(0) := 0. If packet u is received by node 2 at time r, then its auxiliary 
encoding vector P is added to U at time r, i.e., U{t^) := {P} U U{r). 

We associate with node 3 the set of vectors W, which again varies with time and 
is initially empty. Suppose packet u, with auxiliary encoding vector j3, is received by 
node 3 at time r. Let /i be a positive integer, which we call the innovation order. 
Then we say u is innovative if /3 ^ span(W (r)) and |t^('r)| > |H^(r)| + — 1. If u is 

innovative, then (3 is added to W at time r.^ 

^This definition of innovative differs from merely being informative, wliich is the sense in which 
innovative is used in Section ITU and in j^H]- Indeed, a packet can be informative, in the sense that in 
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The definition of innovative is designed to satisfy two properties: First, we re- 
quire that W{A), the set of vectors in W when the scheme terminates, is linearly 
independent. Second, we require that, when a packet is received by node 3 and 
|t/(r)| > |Vr(T)| + /i — 1, it is innovative with high probability. The innovation order 
/i is an arbitrary factor that ensures that the latter property is satisfied. 

Suppose |t/('r)| > |Vr(r)| + /i — 1. Since m is a random linear combination of 
vectors in f/(r), it follows that u is innovative with some non-trivial probability. 
More precisely, because /3 is uniformly-distributed over gl^^"^)! possibilities, of which 
at least gl^^"^)! — gl^WI are not in span(iy(r)), it follows that 

n\UiT)\ _ \WiT)\ 

Pt{(3 i span(l^(r))) > ^ = 1 - gl^WI-I^WI > 1 - g"/^. 

Hence u is innovative with probability at least 1 — . Since we can always discard 
innovative packets, we assume that the event occurs with probability exactly 1 — . 
If instead |f/(r)| < |W^(t)| + /i — 1, then we see that u cannot be innovative, and 
this remains true at least until another arrival occurs at node 2. Therefore, for 
an innovation order of /i, the propagation of innovative packets through node 2 is 
described by the propagation of jobs through a single-server queueing station with 
queue size (|f/(r)| - \ W{t)\ - fi + 1)+. 

The queueing station is serviced with probability 1 — whenever the queue is 
non-empty and a received packet arrives on arc (2,3). We can equivalently consider 
"candidate" packets that arrive with probability 1 — whenever a received packet 
arrives on arc (2, 3) and say that the queueing station is serviced whenever the queue 
is non-empty and a candidate packet arrives on arc (2,3). We consider all packets 
received on arc (1,2) to be candidate packets. 

The system we wish to analyze, therefore, is the following simple queueing system: 

gives a node some new, as yet unknown, information about wi, W2, . . . , wjv (or about wi, W2, • ■ • , wk), 
and not satisfy this definition of innovative. In this appendix, we have defined innovative so that 
innovative packets are informative (with respect to other innovative packets at the node), but not 
necessarily conversely. This allows us to bound, or dominate, the behavior of the coding scheme, 
though we cannot describe it exactly. 
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Jobs arrive at node 2 according to the arrival of received packets on arc (1,2) and, 
with the exception of the first /i — 1 jobs, enter node 2's queue. The jobs in node 
2's queue are serviced by the arrival of candidate packets on arc (2, 3) and exit after 
being serviced. The number of jobs exiting is a lower bound on the number of packets 
with linearly-independent auxiliary encoding vectors received by node 3. 

We analyze the queueing system of interest using the fluid approximation for 
discrete- flow networks (see, e.g., [221 El]). We do not explicitly account for the fact 
that the first — 1 jobs arriving at node 2 do not enter its queue because this fact 
has no effect on job throughput. Let Bi, B, and C be the counting processes for the 
arrival of received packets on arc (1,2), of innovative packets on arc (2,3), and of 
candidate packets on arc (2,3), respectively. Let Q{t) be the number of jobs queued 
for service at node 2 at time r. Hence Q = Bi — B. Let X := Bi — C and Y := C — B. 
Then 

Q = X + Y. (2.12) 

Moreover, we have 

Q{T)dY{T) = 0, (2.13) 
dY{T) > 0, (2.14) 

and 

g(r) > (2.15) 

for all r > 0, and 

r(0)=0. (2.16) 

We observe now that equations ()2.12|) - ()2.16p give us the conditions for a Skorohod 
problem (see, e.g., [211 Section 7.2]) and, by the oblique reflection mapping theorem, 
there is a well-defined, Lipschitz-continuous mapping $ such that Q = $(X). 



2. A. APPENDIX: FORMAL ARGUMENTS FOR MAIN RESULT 45 
Let 

C{Kt) 

:' ) '■= 



X{Kt) 



K ' 
and 

Recall that /I233 is the counting process for the arrival of received packets on arc 
(2,3). Therefore, C(r) is the sum of A233(t) Bernoulli-distributed random variables 
with parameter 1 — g^'^. Hence 

C{t) := lim C^''\t) 

= lim (1 — q ^) a.s. 

K^oo K 

= (1 - q~^)z2'i-iT a.s., 
where the last equality follows by the assumptions of the model. Therefore 

X{t) := lim X(^)(r) = (^122 - (1 - g"'')^233)r a.s. 

K—*cyo 

By the Lipschitz-continuity of $, then, it follows that Q := lim^^^oo Q'^^'' = ^i-^)^ 
i.e., Q is, almost surely, the unique Q that satisfies, for some Y, 

Q{r) = {zu2 - (1 - g-^)^233)r + Y, (2.17) 
QiT)dY{r) = 0, (2.18) 
rfF(r) > 0, (2.19) 



and 



Q(r) > 



(2.20) 
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for all r > 0, and 



r(o) = 0. 



(2.21) 



A pair {Q,Y) that satisfies (ETTfl - dT^ is 



Qir) 



(2.22) 



and 



(2122 - (1 - g ^)2;233) T, 



where, for a real number x, (x)"*" := max(x, 0) and (x)~ := max(— x, 0). Hence Q is 
given by equation ()2.22|) . 

Recall that node 3 can recover the message packets with high probability if it 
receives [A'(l + e)\ packets with linearly-independent auxiliary encoding vectors and 
that the number of jobs exiting the queueing system is a lower bound on the number 
of packets with linearly-independent auxiliary encoding vectors received by node 3. 
Therefore, node 3 can recover the message packets with high probability if [-ft'(l + e)J 
or more jobs exit the queueing system. Let u be the number of jobs that have exited 
the queueing system by time A. Then 



z/ = Si(A)-g(A). 



Take K = q-^')ARcR/{l + e)], where <Rc<l. Then 



lim 

K^oo 



V 




Z122 - {Zi22 - (1 - g ^)Z233) 



+ 



(1 - q-'')RcR 



min(2i22, (1 - g ^)z233) 



> 



(1 - q->')RcR 
1 min(^i22, ^233) ^ 



2. A. APPENDIX: FORMAL ARGUMENTS FOR MAIN RESULT 47 
provided that 

i? < min(2;i22,2233)- (2.23) 

Hence, for all R satisfying (j2.23p . u > IK{1 + e)\ with probability arbitrarily close 
to 1 for K sufficiently large. The rate achieved is 



which can be made arbitrarily close to R by varying Rc, and e. 



2. A. 2 L-link tandem network 

For i = 2, 3, . . . , L + 1, we associate with node i the set of vectors Vi, which varies 
with time and is initially empty. We define U := V2 and W := Vl+i. As in the case of 
the two-link tandem, all packets received by node 2 are considered innovative and, if 
packet u is received by node 2 at time r, then its auxiliary encoding vector /3 is added 
to U at time r. For i = 3, 4, . . . , L + 1, if packet u, with auxiliary encoding vector f3, 
is received by node i at time r, then we say u is innovative if /3 ^ span(Vj(r)) and 
|Vi_i(r)| > |V^(r)| + /i — 1. If M is innovative, then P is added to Vi at time r. 

This definition of innovative is a straightforward extension of that in Appendix 12. A. II 
The first property remains the same: we continue to require that iy(A) is a set of 
linearly-independent vectors. We extend the second property so that, when a packet 
is received by node i for any z = 3, 4, . . . , L + 1 and |V^_i(r)| > |Vi(r)| + yU — 1, it is 
innovative with high probability. 

Take some i G {3, 4, . . . , L + 1}. Suppose that packet u, with auxiliary encoding 
vector P, is received by node i at time r and that |Vj_i(r)| > |V^i(T)| + /i — 1. Thus, 
the auxiliary encoding vector (3 is a. random linear combination of vectors in some 
set Vq that contains Vi_i(r). Hence, because /? is uniformly-distributed over 
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possibilities, of which at least not in span(Vi(r)), it follows that 



Therefore u is innovative with probability at least 1 — 

Following the argument in Appendix 12. A. 11 we see, for alH = 2, 3, . . . , L, that the 
propagation of innovative packets through node i is described by the propagation of 
jobs through a single-server queueing station with queue size (|V^j(T)| — |Vi+i(r)| — fi + 
1)"*" and that the queueing station is serviced with probability 1 — q^^ whenever the 
queue is non-empty and a received packet arrives on arc (i, i + We again consider 
candidate packets that arrive with probability 1 — whenever a received packet 
arrives on arc + 1) and say that the queueing station is serviced whenever the 
queue is non-empty and a candidate packet arrives on arc (z, i + 

The system we wish to analyze in this case is therefore the following simple queue- 
ing network: Jobs arrive at node 2 according to the arrival of received packets on arc 
(1,2) and, with the exception of the first fi — 1 jobs, enter node 2's queue. For 
2 = 2, 3, . . . , L — 1, the jobs in node i's queue are serviced by the arrival of candidate 
packets on arc + 1) and, with the exception of the first fi — 1 jobs, enter node 
(« + l)'s queue after being serviced. The jobs in node L's queue are serviced by the ar- 
rival of candidate packets on arc (L, L + 1) and exit after being serviced. The number 
of jobs exiting is a lower bound on the number of packets with linearly-independent 
auxiliary encoding vectors received by node L + 1. 

We again analyze the queueing network of interest using the fluid approximation 
for discrete-flow networks, and we again do not explicitly account for the fact that 
the first fi — 1 jobs arriving at a queueing node do not enter its queue. Let Bi be the 
counting process for the arrival of received packets on arc (1, 2). For i = 2, 3, . . . , L, 
let Bi, and Ci be the counting processes for the arrival of innovative packets and 
candidate packets on arc + 1), respectively. Let Qii^r) be the number of jobs 
queued for service at node i at time r. Hence, for z = 2, 3, . . . , L, = — B^. 



\Vo\ _ „|V,(r)| 



1 _ q\Vdr)\-]Vo\ yi_ ^\ViiT)\-]V,.i(T)\ 



Pr(/5 ^ span(V^i(r))) > 



q\Vo\ 
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Let Xi := Cj_i — Ci and Yi := C,, — Bi, where Ci := Bi. Then, we obtain a Skorohod 
problem with the following conditions: For alH = 2, 3, . . . , L, 

Qi = Xi — Yi-i + Yi. 

For all r > and z = 2, 3, . . . , L, 

g,(r)rfF,(r) = 0, 
dY,{r) > 0, 



and 



For alH = 2, 3, . . . , L, 



Let 



Qiir) > 0. 



Y,{0) = 0. 



and Qj := lim/^^oo Qi^^ for z = 2, 3, . . . , L. Then the vector Q is, almost surely, the 
unique Q that satisfies, for some Y, 



- _ , (^122 - (1 - q-nz233)r + Y^ir) if z = 2, 

= <( _ _ (2.24) 

1 - g"^)(2(i_i)ii - 2i(i+i)(i+i))r + Yi{T) - Fi_i(r) otherwise, 

g,(r)c/Fi(r) = 0, (2.25) 
dYiir) > 0, (2.26) 



and 



Q^{r) > 



(2.27) 
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for all r > and i = 2, 3, . . . , L, and 



^^(0) = (2.28) 



for alH = 2, 3, . . . , L. 

A pair {Q,Y) that satisfies ^TI^ - ^TI^ is 



Q,(r) = (min(2i22, min {(1 - q - (1 - g ^)2;i(i+i)(i+i))+r (2.29) 

2<j<i 

and 



Yi{T) = (min(2i22, min {(1 - q '')2;j0+i)0+i)}) - (1 - g ^)2;i(i+i)(i+i)) r. 

2<J<« 

Hence Q is given by equation ()2.29p . 

The number of jobs that have exited the queueing network by time A is given by 



Si(A)- J]g,(A). 



i=2 



Take K =\{1- q-^')ARcR/{l + e)], where < < I. Then 



i?i(A)-Ef=2Q(A) 



lim : — = lim 



K^oo [K{1 + £:)J K^oo K{1 + e) 

_ min(zi22,min2<»<L{(l - g~^)^i(i+i)(i+i)}) ^^-^ 



;i - q-^')RcR 
^ 1 mmi<i<L{zi(i+i)(i+i)} ^ 

Rq r 

provided that 

R < mm {zi(i+i)(^i+i)}. (2.31) 

l<i<L 

Hence, for all R satisfying ()2.3H) . u > [A'(l + e)\ with probability arbitrarily close 
to 1 for K sufficiently large. The rate can again be made arbitrarily close to R by 
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varying fi, Rc, and e. 

2. A. 3 General unicast connection 

Consider a single path p„^. We write Pm = {^1,^2, • • • !^Lm!^Lm+i}) where ii = s and 
^Lm+i = For ^ = 2, 3, . . . , + 1, we associate with node ii the set of vectors 
Vi^^'^\ which varies with time and is initially empty. We define f/^P™) := V^'-^™-' and 

We note that the constraint (j2.3p can also be written as 

XiJj < ^UL^i-^L 

for all {i, J) G A and j G J, where XljeL '^iJL ~ -'- o'^) ^ and L C J, 

and a^^j^ > for all {i, J) G A, L C J, and j G L. Suppose packet u, with auxiliary 
encoding vector f], is placed on hyperarc (ii, J) and received by C J, where K 3 i2, 
at time r. We associate with u the independent random variable which takes the 

value m with probability Rm<yf-l^jK/ ^{LcJ\i2eL} ^^hJL^^^^- ~ then we say u 

is innovative on path pm, and /5 is added to [/(p™-) at time r. 

Take / = 2, 3, ... , L^. Now suppose packet m, with auxiliary encoding vector 
P, is placed on hyperarc {ii,J) and received by K C J, where K 3 ii^i, at time 
r. We associate with u the independent random variable which takes the value 

m with probability -Rm«i*j'/^V X]{LcJ|i; i€L}'^t'jL^ ^ innovative on 
path pm ii Pu = m, p i span(U™Ji^H^(P")(A) U V^JrV) U U*L„+if/(P")(A)), and 
\V,^'-\r)\>\V,^l-\T)\+^-l. 

This definition of innovative is somewhat more complicated than that in Ap- 
pendices I2XT] and I2X2I because we now have M paths that we wish to analyze 
separately. We have again designed the definition to satisfy two properties: First, we 
require that U^^^Vr*^^'")(A) is linearly- independent. This is easily verified: Vectors 
are added to W'^^^\t) only if they are linearly independent of existing ones; vectors 
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are added to W^^^\t) only if they are linearly independent of existing ones and ones 
m iy(Pi)(A); and so on. Second, we require that, when a packet is received by node 
ii, Pu = m, and |V;ff™''(r)| > |V'/^'"''(r)| + /i — 1, it is innovative on path with high 
probability. 

Take / G {3, 4, ... , Lm + !}• Suppose that packet u, with auxiliary encoding vector 
(3, is received by node ii at time r, that P„ = m, and that |V^^"^(r)| > |V'/^™''(r)| + 
fi — 1. Thus, the auxiliary encoding vector /5 is a random linear combination of 
vectors in some set Vq that contains Vi^'^\t). Hence (3 is uniformly-distributed 
over gl^^^l possibilities, of which at least g'^"' — are not in span(v/^'"'*(r) U V\m), 
where d := dim(span(\/o) n span(V;^^'" V) U V\„)). Note that V^ffj^V) U V\m forms a 
linearly-independent set, so 

d - \Vo\ < dim(span(V;LTV)) H span(V;^^™V) U V\m)) - iVjL^rV)! 
= dim(span(V;5"V)) H span(V/^'" V))) - IVJLTV)! 
<\V,'^-\r)\-\V,'^^T\r)\<-f^. 

Therefore, it follows that 

Pr(/5 ^ span(V^^^'"V) U V\^)) > ^'^^'^ ~ ^'^ = 1 - q'^-^^"^ > 1 - q~^\ 

We see then that, if we consider only those packets such that P^ = m, the con- 
ditions that govern the propagation of innovative packets are exactly those of an 
Lm-link tandem network, which we dealt with in Appendix 12. A. 21 By recalling the 
distribution of Pu, it follows that the propagation of innovative packets along path 
Pm behaves like an Lm-link tandem network with average arrival rate Rm on every 
link. Since we have assumed nothing special about m, this statement applies for all 
m = 1,2, ...,M. 
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Take K = [(1 - g-^)Ai?,i?/(l + , where < i?^ < 1. Then, by equation (fOn|l . 

k'L [K{1 + e)\ R' 

H6I1C6 

kZo [K{l+e)\ ^\K{l + 6)\ ^R 

u \ /-I m=l ^ m=l 

As before, the rate can be made arbitrarily close to R by varying Rc, and e. 



Chapter 3 
Subgraph Selection 



E NOW turn to the subgraph selection part of the efficient operation prob- 



V V lem. This is the problem of determining the coding subgraph to use given 
that the network code is decided. In our case, we assume that the network code is 
given by the scheme examined in the previous chapter. Since this scheme achieves 
the capacity of a single multicast connection in a given subgraph, in using it and 
determining the coding subgraph independently, there is no loss of optimality in the 
efficient operation problem provided that we are constrained to only coding packets 
within a single connection.^ Relaxing this constraint, and allowing coded packets 
to be formed using packets from two or more connections, is known to afford an 
improvement, but finding capacity-achieving codes is a very difficult problem — one 
that, in fact, currently remains open with only cumbersome bounds on the capability 
of coding jnn] and examples that demonstrate the insufficiency of various classes of 
linear codes P^I82[l^ l93j. Constraining coding to packets within a single connection 
is called superposition coding |115j . and there is evidence to suggest that it may be 
near-optimal [HH]. We therefore content ourselves with coding only within a single 
connection, allowing us to separate network coding from subgraph selection without 

^This statement assumes that no information is conveyed by the timing of packets. In general, the 
timing of packets can be used to convey information, but the amount of information communicated 
by timing does not grow in the size of packets, so the effect of such "timing channels" is negligible 
for large packet sizes. 
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loss of optimality. 

We formulate the subgraph selection problem in Section 13.11 The problem we 
describe is rich one and the direction we take is simply the one that we believe is 
most appropriate. Certainly, there are many more directions to take, and our work 
has lead to follow-on work that extend the problem and explore other facets of it 
(see, e.g., O dl IHHl IMl IM 11121 ESI)- In Section O we discuss distributed 
algorithms for solving the problem. Such algorithms allow subgraphs to be computed 
in a distributed manner, with each node making computations based only on local 
knowledge and knowledge acquired from information exchanges. Perhaps the most 
well-known distributed algorithm in networking is the distributed Bellman-Ford al- 
gorithm (see, e.g., Section 5.2]), which is used to find routes in routed packet 
networks. Designing algorithms that can be run in a distributed manner is not an 
easy task and, though we do manage to do so, they apply only in cases where links 
essentially behave independently and medium access issues do not pose significant 
constraints, either because they are non-existent or because they are dealt with sepa- 
rately (in contrast to the slotted Aloha relay channel of Section ri.2.1l where medium 
access issues form a large part of the problem and must be dealt with directly). In 
Section 13.31 we introduce a dynamic component into the problem. Dynamics, such 
as changes in the membership of the multicast group or changes in the positions of 
the nodes, are often present in problems of interest. We consider the scenario where 
membership of the multicast group changes in time, with nodes joining and leaving 
the group, and continuous service to all members of the group must be maintained — a 
problem we call dynamic multicast. 

3.1 Problem formulation 

We specify a multicast connection with a triplet (s, T, {Rt}teT), where s is the source 
of the connection, T is the set of sinks, and {Rt}t£T is the set of rates to the 
sinks (see Section I2.2.2|) . Suppose we wish to establish C multicast connections, 
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(si, Ti, {Rt^i}), . . . , {sc, Tc, {Rt,c})- Using Theorem 12.11 and the max-flow/min-cut 
theorem, we see that the efficient operation problem can now be phrased as the fol- 
lowing mathematical programming problem: 

minimize f{z) 

subject to z & Z, 

c 

^ VuK < ZijK, V {i, J) e A, K C J, ^3^^^l 

c=l 

Yl y^L, yii,J)eA,KcJ,teT,,c=l,...,C, 

j£K {LcJ\LnKj^(l)} 

where x*^*''^-' is the vector consisting of x^lf^ , {i, J) E A, j & J, and F^^''^^ is the bounded 
polyhedron of points x^^''^^ satisfying the conservation of flow constraints 

f 

R+ r if 2 = Sr^ 



E E^ S?- E 

{J\{i,J)£A} jeJ {j\ij,l)eA,iel} 



■t,c 

-Rt,c iii = t, V z G AT, 

otherwise. 



and non-negativity constraints 

x^jf > 0, V (z, J) eA,je J. 

In this formulation, y^jj^ represents the average rate of packets that are injected on 
hyperarc {i, J) and received by exactly the set of nodes K (which occurs with average 
rate Zijx) and that are allocated to connection c. 

For simplicity, let us consider the case where C = 1. The extension to C > 1 is 
conceptually straightforward and, moreover, the case where C = 1 is interesting in 
its own right: whenever each multicast group has a selfish cost objective, or when the 
network sets link weights to meet its objective or enforce certain policies and each 
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multicast group is subject to a minimum-weight objective, we wish to estabhsh single 
efficient multicast connections. 
Let 

, _ '^{L(ZJ\LnK^ll)} ^iJL 
OiJK '■— 5 

which is the fraction of packets injected on hyperarc {i, J) that are received by a set 
of nodes that intersects K. Problem (jH.lj) is now 

minimize f{z) 
subject to z E Z, 

(3 2) 

^ {i,J)eA,K GJ,teT, ' ■ ' 

jeK 

^w^fW, yteT. 

In the lossless case, problem flH.2|l simplifies to the following problem: 

minimize f{z) 
subject to z E Z, 

J2xfl<zu, Wiz,J)eA,teT, ^^-^^ 

As an example, consider the network depicted in Figure Kllf which consists only 
of point-to-point links. Suppose that the network is lossless, that we wish to achieve 
multicast of unit rate from s to two sinks, ti and ^2, and that we have Z = [0, 1]'"^' 
and f{z) = j)g^%- optimal solution to problem (|3.3p is shown in the figure. 
We have flows x^^^ and x*-^-* of unit size from s to ti and t2, respectively, and, for each 
arc Zij = max(a;-j], x^^]), as we expect from the optimization. 

The same multicast problem in a routed packet network would entail minimizing 
the number of arcs used to form a tree that is rooted at s and that reaches ti and t2 — 
in other words, solving the Steiner tree problem on directed graphs The Steiner 
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Figure 3.1: A network of lossless point-to-point links with multicast from s to T = 
{^1,^2}- Each arc is marked with the triple {zij , x[f, , x^f,) . 




Figure 3.2: A network of lossless broadcast links with multicast from s to T = {^1,^2}- 
Each hyperarc is marked with Zij at its start and the pair {x[^jpxfj-) at its ends. 
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tree problem on directed graphs is well-known to be NP-complete, but solving problem 
fl3.3|) is not. In this case, problem ()3.3|) is in fact a linear optimization problem. It 
is a linear optimization problem that can be thought of as a fractional relaxation of 
the Steiner tree problem |117j . This example illustrates one of the attractive features 
of the coded approach: it allows us avoid an NP-complete problem and instead solve 
its fractional relaxation. In Section ^21 we examine the efficiency improvements that 
we can achieve from this feature. 

For an example with broadcast links, consider the network depicted in Figure IT^ 
Suppose again that the network is lossless, that we wish to achieve multicast of unit 
rate from s to two sinks, ti and t2, and that we have Z = [0, l]'-^' and f{z) = 
j)eA^i-^- optimal solution to problem (j3.3|) is shown in the figure. We still 
have fiows x*-^-* and x^^-* of unit size from s to ti and ^2, respectively, but now, for each 
hyperarc (i,J), we determine Zij from the various fiows passing through hyperarc 
(i, J), each destined toward a single node j in J, and the optimization gives Zij = 



Neither problem ()3.2|) nor ()3.3|) as it stands is easy to solve. But the problems 
are very general. Their complexities improve if we assume that the cost function is 



fij is a convex or linear function, which is a very reasonable assumption in many 
practical situations. For example, packet latency is usually assessed with a separable, 
convex cost function and energy, monetary cost, and total weight are usually assessed 
with separable, linear cost functions. The problems examined in our performance 
evaluation in Chapter |1J which we believe refiect problems of practical interest, all 
involve separable, linear cost functions. 

The complexities of problems (j3.2j) and (j3.3|) also improve if we make some as- 
sumptions on the form of the constraint set Z, which is the case in most practical 
situations. 

A particular simplification applies if we assume that, when nodes transmit in a 



max(^ 




separable and possibly even linear, i.e., if we suppose f{z) =Y1 



{i,J)eA 



fij{zij), where 
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lossless network, they reach all nodes in a certain region, with cost increasing as this 
region is expanded. This applies, for example, if we are interested in minimizing 
energy consumption, and the region in which a packet is reliably received expands 
as we expend more energy in its transmission. More precisely, suppose that we have 
separable cost, so f{z) = j)eA fiji^ij)- Suppose further that each node i has Mi 
outgoing hyperarcs (i, jf^), (i, ),..., (i, jj^) with jf^ C J^^^ C ■ ■ ■ C jj^. (We 
assume that there are no identical links, as duplicate links can effectively be treated 
as a single link.) Then, we assume that /. ,(i)(C) < / ^(^(C) < ■ • ■ < /•,{») (C) for 

IJ-I lJ2 ^ 

> and nodes i. 

Let us introduce, for (z, j) G A' := {(z, j)|(z, J) E A, J 3 j}, the variables 

(*) ._ Jt) 



m=m{i,j) 

where m{i,j) is the unique m such that j G Jm \ Jm-i ("^^ define J^f* := for all 
z G A/" for convenience). Now, problem (|3.3p can be reformulated as the following 
problem, which has substantially fewer variables: 

minimize ^ fij{zij) 

{i,J)&A 

subject to z E Z 

(3.4) 

xfl < J2 "^iji^h V z G AT, m = 1, . . . , Mi, t G T, 

ka r'') \ n=m 

f^GF^*), VtGT, 
where F'-*^ is the bounded polyhedron of points x^*^ satisfying the conservation of flow 
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constraints 



E 



X. 



it) 



E 4 



Rt Hi = s, 
—Rt if i = t, 
otherwise, 



and non-negativity constraints 



'roposition 3.1. Suppose that f (z) = Y.{i,j)eA fij(^ij) and that f .j(^){C) < /.jw(C) 
■ ■ < (C) for all > and i G A/". Then problem i3. ^) anc? problem ^3.4\ ) 



< 



are 



equivalent in the sense that they have the same optimal cost and z is part of an optimal 
solution for MS. if and only if it is part of an optimal solution for \S.4^ . 

Proof. Suppose (x, z) is a feasible solution to problem (|3.3|) . Then, for all (i, j) G A! 
and t ^T, 



Mi 



Mi 



X 



m=m{i,j) 



m=m{i,j) j^) 



M,, 



X 



> 



E E 

fcgj('' m=max(m(i j),m(i,fc)) 
Mi 

E E 

k&A') . , m=raa.^{m{i,j),m{i,k)) 

^ "i{i,i)-l 



X 



it) 



Mi 



E E 

k&A'hj'-'l , rn=ni{i,k) 



E 



'=6'^Mi\'^i(i,j)-l 



Hence (x, z) is a feasible solution of problem (|3.4|) with the same cost. 
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Now suppose {x,z) is an optimal solution of problem ()3.4|1 . Since /jj(o(C) < 
/■/(») (C) < ■ ■■ < f-M){C) all ^ > and i E J\f hj assumption, it follows that. 



for all i G Af, the sequence z. ^(i) , z. 
m = Mi, by 



z.j{i) is given recursively, starting from 



z.j{i) = max 



m t€T 



X. 



it) 

ik 



Mi 



- E 



in'=m+l 



Hence > for all i G A/" and m = 1,2, ...,Mj. We then set, starting from 



m = Mj and j G J 



(0 



mm 



Mi 



\ 



X. 



it) 



— > X. ., 2. — > 

A-l' I J; J ' I Jin Z-^ 



X 



it) 



\ 



=m+l 



It is now not difficult to see that (x, z) is a feasible solution of problem ()3.3|) with the 
same cost. 

Therefore, the optimal costs of problems ()3.3|) and ()3.4|) are the same and, since 
the objective functions for the two problems are the same, z is part of an optimal 
solution for problem ()3.3j) if and only if it is part of an optimal solution for problem 

dUl). □ 



3.1.1 An example 

Let us return again to the slotted Aloha relay channel described in Section 11.2.11 
The relevant optimization problem to solve in this case is ()3.2|) . and it reduces to (cf. 
Section 12211 

minimize 21(23) + ^23 

subject to < 2l(23),-Z23 < 1, 

R < 2;i(23)(l - 223)(Pl(23)2 + Pl(23)3 + Pl(23)(23) ) , 

R < ^1(23) (1 - ^23)(Pl(23)3 +Pl(23)(23)) + (1 " ^1(23))^23P233- 
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y|2:i(23)(1 - Z23) - I 

i2;i{23)(l - 2:23) + 1(1 - ZI(^23))Z23 



Figure 3.3: Feasible set of problem (|3.5p . 

Let us assume some values for the parameters of the problem and work through 
it. Let R := 1/8, ^1(23)2 := 9/16, Pi(23)3 := 1/16, Pi(23){23) := 3/16, and P233 := 3/4. 
Then the optimization problem we have is 

minimize 2:1(23) + ^23 



subject to < -21(23), -223 < 1, 

- < —21(23) (1-^23), 



(3.5) 



11 3 

2 < 7^1(23) (1 - Z23) + 7(1 - ^1{23))223- 



The feasible set of this problem is shown in Figure EiHl It is the shaded region labeled 
Zq. By inspection, the optimal solution of ()3.5|) is the lesser of the two intersections 
between the curves defined by 



Y^2l(23)(l - 223 J 



and 



1 N 3,^ , 1 

^2i(23)(l - Z23) + -(1 - 2l(23))223 = g- 
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We obtain ^(23) ^ 0.179 and ~ 0.141. 

The problem we have just solved is by no means trivial. We have taken a wire- 
less packet network subject to losses that are determined by a complicated set of 
conditions — including medium contention — and found a way of establishing a given 
unicast connection of fixed throughput using the minimum number of transmissions 
per message packet. The solution is that node 1 transmits a packet every time slot 
with probability 0.179, and node 2 transmits a packet every time slot independently 
with probability 0.141. Whenever either node transmits a packet, they follow the 
coding scheme of Section 12.11 

The network we dealt with was, unfortunately, only a small one, and the solution 
method we used will not straightforwardly scale to larger problems. But the solution 
method is conceptually simple, and there are cases where the solution to large prob- 
lems is computable — and computable in a distributed manner. This is the topic of 
the next section. 

3.2 Distributed algorithms 

In many cases, the optimization problems ()3.2|) . ()3.3p . and ()3.4j) are convex or lin- 
ear problems and their solutions can, in theory, be computed. For practical network 
applications, however, it is often important that solutions can be computed in a dis- 
tributed manner, with each node making computations based only on local knowledge 
and knowledge acquired from information exchanges. Thus, we seek distributed al- 
gorithms to solve optimization problems ()3.2|) . ()3.3p . and ()3.4|) . which, when paired 
with the random linear coding scheme of the previous chapter, yields a distributed 
approach to efficient operation. The algorithms we propose will generally take some 
time to converge to an optimal solution, but it is not necessary to wait until the 
algorithms have converged before transmission — we can apply the coding scheme to 
the coding subgraph we have at any time, optimal or otherwise, and continue doing 
so while it converges. Such an approach is robust to dynamics such as changes in 
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network topology that cause the optimal solution to change, because the algorithms 
will simply converge toward the changing optimum. 

To this end, we simplify the problem by assuming that the objective function is 
of the form f{z) = j)eA fiji^ij)^ where fij is a monotonically increasing, convex 
function, and that, as Zij is varied, Zijx/zij is constant for all K C J. Therefore, bijK 
is a constant for all (i, J) & A and K G J. We also drop the constraint set Z, noting 
that separable constraints, at least, can be handled by making fij approach infinity 
as Zij approaches its upper constraint. These assumptions apply if, at least from the 
perspective of the connection we wish to establish, links essentially behave indepen- 
dently and medium access issues do not pose significant constraints, either because 
they are non-existent or because they are dealt with separately. The assumptions 
certainly restrict the range of applicable cases, but they are not impractical; they 
apply, in particular, to all of the problems examined in our performance evaluation 
in Chapter m 

With these assumptions, problem (j3.2|) becomes 

minimize ^ fij{zij) 

subject to J2 ^ ^iJ^iJK, W {i,J) eA, K C J,te T, (3.6) 

x^gfW, vteT. 

Since the fij are monotonically increasing, the constraint 

J2 < ^^AJK, ^{t,J)eA,KcJ,teT (3.7) 

gives 

zu= max /%^|. (3.8) 
Expression ()3.8|1 is, unfortunately, not very useful for algorithm design because the 
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max function is difficult to deal with, largely as a result of its not being differentiable 
everywhere. One way to overcome this difficulty is to approximate Zij by replacing 
the max in ()3.8|) with an /'"-norm (see jSI]), i-e., to approximate zu with z'^j, where 




The approximation becomes exact as m ^ oo. Moreover, since z'^j > Zij for all m > 0, 
the coding subgraph z' admits the desired connection for any feasible solution. 
Now the relevant optimization problem is 

minimize ^ /^(^ij) 

{i,J)&A 

subject to x^*) G MteT, 

which is no more than a convex multicommodity flow problem. There are many 
algorithms for convex multicommodity flow problems (see [HI] for a survey), some of 
which (e.g., the algorithms in jHlEl) are well-suited for distributed implementation. 
The primal-dual approach to internet congestion control (see |1UU1 Section 3.4]) can 
also be used to solve convex multicommodity flow problems in a distributed manner, 
and we examine this method in Section Td. 2. II 

There exist, therefore, numerous distributed algorithms for the subgraph selection 
problem — or, at least, for an approximation of the problem. What about distributed 
algorithms for the true problem? One clear tactic for finding such algorithms is 
to eliminate constraint (|3.7|) using Lagrange multipliers. Following this tactic, we 
obtain a distributed algorithm that we call the subgradient method. We describe the 
subgradient method in Section 13.2.21 
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3.2.1 Primal-dual method 

For the primal-dual method, we assume that the cost functions fij are strictly convex 
and different iable. Hence there is a unique optimal solution to problem (j3.6|) . We 
present the algorithm for the lossless case, with the understanding that it can be 
straightforwardly extended to the lossy case. Thus, the optimization problem we 
address is 



minimize ^ fijiz'u) 
ii,J)eA 

subject to a;^*) G V t G T, 



(3.9) 



where 



l/m 



Let (y)^ denote the following function of y: 



if a > 0, 



max{y, 0} if a < 0. 



To solve problem (j3.9|) in a distributed fashion, we introduce additional variables 
p and A and consider varying x, p, and A in time r according to the following time 
derivatives: 



iJj 



iJj 



iJjVHJj/ 



-X. 



(3.10) 

(3.11) 
(3.12) 
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where 



(t) (t) (t) 
Qij ■=Pi -P)\ 

i/l"- E E-a - E 4i 

{J\{i,J)eA} j&J {i|(i,/)eAi6/} 

and kf]j{xfjj) > 0, hf\pf^) > 0, and fnfjji^fjj) > are non- decreasing continuous 
functions of xf]j, pf\ and Xfjj respectively. 

Proposition 3.2. The algorithm specified by Equations Ly.l(jl\} - X'J.l^} is globally, 
asymptotically stable. 

Proof. We prove the stabihty of the primal-dual algorithm by using the theory of 
Lyapunov stability (see, e.g., [lUUI Section 3.10]). This proof is based on the proof of 
Theorem 3.7 of |lUUj . 

The Lagrangian for problem (|3.9p is as follows: 



L{x,p,\) = fuUj) 



E^Efl'M E E-a- E 4'.--.'" 

t&T [iGAf \{J\(i,J)€A} jeJ {j\ijJ)eA,i€l} 



- E E^a-S4' 

{i,J)£A j&J J 

where 

Rt if z = s, 
<y^'^ = < -Rt iit = t, 

otherwise. 

Since the objective function of problem ()3.9p is strictly convex, it has a unique min- 
imizing solution, say x, and Lagrange multipliers, say p and A, which satisfy the 
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following Karush-Kuhn- Tucker conditions: 

- + (p!" - Pf ) - - 0. V (. J)eA,,eJ,te T, 

ijj \ ijj / 

(3.14) 

E E-SX- E 4^ = -'^ V.GA/-,teT, (3.15) 
{J|(i,J)e.4} jeJ {il(i,/)eAie/} 

^S?,- > V (z, J) G ^, J G J, t G T, (3.16) 
aS,- > V (^, J) G A J G J, t G T, (3.17) 
= V (^, J) G A J e J, t G T. (3.18) 

Using equation (j3.13j) . we see that {x,p, A) is an equilibrium point of the primal- 
dual algorithm. We now prove that this point is globally, asymptotically stable. 
Consider the following function as a candidate for the Lyapunov function: 



Vix,p,\) 

Note that Vix,p, A) = 0. Since, kf^ia) > 0, if x^^ ^ x^^, we have /""^ -^(a - 

xfjj)da > 0. This argument can be extended to the other terms as well. Thus, 
whenever {x,p, A) 7^ {x,p, A), we have V{x,p, A) > 0. 
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Now, 



dfiA4j) , (t) 



Note that 



i Jj 



since the inequahty is an equahty if either xf]j < or Xfjj > 0; and, in the case when 
x^^ > and A« < 0, we have {-xf]^y,,, = and, since > 0, -xS5,(A« -A^J,) > 



At) 
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0. Therefore, 



teT I (i,j)e-4 jeJ 



ij j 



{q - -x) + {p- p)\y - y) 



ijj 



iJ) 



E E E, 



dfuizu) dfu{z[j) \ (J) _ _ 



where the last hne follows from Karush-Kuhn- Tucker conditions ()3.14|) - ()3.18|) and 

the fact that 



py 



EEf!" E E 



X 



(t) 

iJj 



E 



At) 



teT ieAf 



P) 



q'x. 



t€T {i,J)(^A j€J 



Thus, owing to the strict convexity of the functions {fij}, we have V < —X'x, with 
equality if and only if x = x. So it follows that < for all A > 0, since x > 0. 

If the initial choice of A is such that A(0) > 0, we see from the primal-dual 
algorithm that A(r) > 0. This is true since A > whenever A < 0. Thus, it follows by 
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the theory of Lyapunov stabihty that the algorithm is indeed globally, asymptotically 
stable. □ 

The global, asymptotic stability of the algorithm implies that no matter what 
the initial choice of {x,p) is, the primal-dual algorithm will converge to the unique 
solution of problem ()3.9p . We have to choose A, however, with non-negative entries as 
the initial choice. Further, there is no guarantee that x(r) yields a feasible solution 
for any given r. Therefore, a start-up time may be required before a feasible solution 
is obtained. 

The algorithm that we currently have is a continuous time algorithm and, in prac- 
tice, an algorithm operating in discrete message exchanges is required. To discretize 
the algorithm, we consider time steps n = 0, 1, . . . and replace the derivatives by 
differences: 

+ 1] = - c^M [^^^f^ + - ' (3-19) 

p?[n + 1] = p?[n] + Pi'\n]{yf\n] - ), (3.20) 
>^,[n + 1] = A^-N + , , (3.21) 



where 



qlf[n] ■.= pf\n] -pf\n], 

{.J\{i,J)eA} jeJ {jia,/)6Aie/} 

and af]j [n] > 0, /3f^[n] > 0, and -fljj[n] > are step sizes. This discretized algorithm 
operates in synchronous rounds, with nodes exchanging information in each round. 
We expect that this synchronicity can be relaxed in practice, but this issue remains 
to be investigated. 

We associate a processor with each node. We assume that the processor for node 
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1. Each node i initializes pi[0], {a;iJj[0]}{jj|(i,j)g^,jgj}, and 
{^iJjM}{J,j\{i,J)eA,jeJ} such that \ijj[0] > for all (J,j) such 
that {i, J) E A and j G J. Each node i sends pJO], {xijj[0]}j^j, and 
{Aijj[0]}jgj over each outgoing hyperarc {i, J). 

2. At the nth iteration, each node i computes Pi[n + 1], {xijj[n + 

1]}{JJ |(» j)gA j£./}^ and {Aiji[n + l]}{J,i|(i,J)eAjeJ} using equations 
fllll9j) - ()H.21|l . Each node i sends pi[n + 1], + and 

{Xijj[n + over each outgoing hyperarc (i, J). 

3. The current coding subgraph z'[n] is computed. For each node i, we 
set 



\tGT \jeJ / / 

for all outgoing hyperarcs {i, J). 

4. Steps El and El are repeated until the sequence of coding subgraphs 
{^'[n]} converges. 



i keeps track of the variables pi, {xijj}{jj|(i,j)g^,jGJ}, and {Kjj}{j,j\{i,j)eA,jeJ}- With 
such an assignment of variables to processors, the algorithm is distributed in the sense 
that a node exchanges information only with its neighbors at every iteration of the 
primal-dual algorithm. We summarize the primal-dual method in Figure (231 

3.2.2 Subgradient method 

We present the subgradient method for linear cost functions; with some modifications, 
it may be made to apply also to convex ones. Thus, we assume that the objective 
function / is of the form 




Figure 3.4: Summary of the primal-dual method. 




ii,J)eA 



where au > 0. 
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Consider the Lagrangian dual of problem (j3.6j] : 
maximize g^*-* (p*-*^ ) 

subject to 5^ 5^ pS?^ = au V (z, J) G ^, (3.22) 

ieT A'CJ 

P^K > 0, V (^, J) G ^, C J, t G T, 



where 



^"•(^"•)-.,s^?,., E E( E gl-* (3.23) 

(i,J)£AjeJ \{KcJ\KBj} '-^^ 



In the lossless case, the dual problem defined by equations (j3.22|) and ()H.2H|1 sim- 
plifies somewhat, and we require only a single dual variable pf]j for each hyperarc 
[i, J). In the case that relates to optimization problem ()3.4jl . the dual problem sim- 
plifies more still, as there are fewer primal variables associated with it. Specifically, 
we obtain, for the Lagrangian dual, 

maximize q^^\p^^^) 

subject to = s (i) , V i G TV, m = 1, . . . , Mi, (3.24) 

teT 

pfj > 0, V (2, J) G ^, t G T, 



where 



and 



s (>) := a w - a w , 



m(ij) 

gW(pW):= min J] ( A, US?. (3.25) 



Note that, by the assumptions of the problem, Sij > for all {i, J) G A. 

In all three cases, the dual problems are very similar, and essentially the same 
algorithm can be used to solve them. We present the subgradient method for the 
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case that relates to optimization problem ()3.4j) — namely, the primal problem 

minimize CbijZij 
ii,J)eA 

Mi 

subject to xffj < z.jii) , \/ ieX,m = l,...,Mi,teT, (3.26) 

with dual ()3.24|) — with the understanding that straightforward modifications can be 
made for the other cases. 

We first note that problem ()3.25p is, in fact, a shortest path problem, which admits 
a simple, asynchronous distributed solution known as the distributed asynchronous 
Bellman-Ford algorithm (see, e.g., Section 5.2.4]). 

Now, to solve the dual problem (|3.24|) . we employ subgradient optimization (see, 
e.g., jni Section 6.3.1] or jS21 Section 1.2.4]). We start with an iterate p[0] in the 
feasible set of ()3.24|) and, given an iterate p[n] for some non-negative integer n, we 
solve problem ()3.25j) for each t in T to obtain x[n\. Let 

aflM ■= Yl ^ikH 

We then assign 

Pij[n + 1] := argmin^(t;W - {j)f][n] + e[n]gf][n])f (3.27) 
for each (i, J) G A, where Pjj is the |T|-dimensional simplex 



ij 



and 6'[ri] > is an appropriate step size. In other words, Pij[n + 1] is set to be the 



3.2. DISTRIBUTED ALGORITHMS 



77 



Euclidean projection of Pij[n] + 6'[n]5fjj[n] onto Pij. 

To perform the projection, we use the following proposition. 

Proposition 3.3. Let u := + ^[?i]5fjj[r2]. Suppose we index the elements of T 

such that n*^*^) > n^*^) > . . . > m*^*!^!-*. Take k to he the smallest k such that 



or set k = \T\ if no such k exists. Then the projection \3.21 ) is achieved by 



^*) + ^^^#^ ^fte{t,,...,t,}, 

otherwise. 



pf][n + l] 



u 



Proof. We wish to solve the following problem. 



El 



minimize > (t^*-* — u*-*^)^ 



subject to f G Pi J. 

First, since the objective function and the constraint set Pjj are both convex, it 
is straightforward to establish that a necessary and sufficient condition for global 
optimality of in P^j is 



V 



W > ^ (mW - > (m^-^) - V r e T (3.28) 



(see, e.g., PI Section 2.1]). Suppose we index the elements of T such that n^*^^ > 
> . . . > M^*!^!^ We then note that there must be an index k in the set {1, . . . , |T|} 
such that f > for / = 1, . . . , and v^*^'^ = for I > k + 1, for, if not, then a 
feasible solution with lower cost can be obtained by swapping around components of 
the vector. Therefore, condition (|3.28|) implies that there must exist some d such that 
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^(t) ^ ^(t) _,_ ^ fQj, -J g 1^^^ ... ,4} and that (i < -n^ for all t G {tfc+i, . . . ,i|T|}, 
which is equivalent to < — n^^'-'+i^ Since O^*-* is in the simplex Pjj, it follows that 



which gives 



By taking k = k, where k is the smallest k such that 



(*fc+l) 



r=l 



(or, if no such k exists, then k = |T|), we see that we have 



k- 

which can be rearranged to give 



k 

Hence, if v^^^ is given by 



1 / 



t=i 




otherwise, 



(3.29) 



then v^^^ is feasible and we see that the optimality condition ()3.28|) is satisfied. Note 
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that, since d < — u^^'^+i^ equation ()H.29j) can also be written as 



= max 



0,«W + i(.a-E«^^')| |. (3.30) 



r=l 



□ 

The disadvantage of subgradient optimization is that, whilst it yields good approx- 
imations of the optimal value of the Lagrangian dual problem ()3.24j) after sufficient 
iteration, it does not necessarily yield a primal optimal solution. There are, however, 
methods for recovering primal solutions in subgradient optimization. We employ the 
following method, which is due to Sherali and Choi 

Let {/ii[v^]}z=i,...,n be a sequence of convex combination weights for each non- 
negative integer n, i.e., Ym=i /^/['^l = 1 and fii[n] > for all / = 1, . . . , n. Further, let 
us define 

7in := ^fcl, / = l,...,n, n = 0,l,..., 

0\n\ 



and 



^7n":= max {7«n - 7a-i)n}. 



Proposition 3.4. If the step sizes {0[n]} and convex combination weights {fii[n]} are 
chosen such that 

1- lin > 7(/-i)n for all I = 2, ... ,n and n = 0,l,..., 

2. A7^^^ — > as n —> oo, and 

3- 7in — >■ as n — i> cx) and 'jnn < ^ for all n = 0,1, .. . for some 6 > 0, 

then we obtain an optimal solution to the primal problem from any accumulation point 
of the sequence of primal iterates {x[n]} given by 



[n]:=J2M'n]x[l], n = 0,l,.... (3.31) 



1=1 
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Proof. Suppose that the dual feasible solution that the subgradient method converges 
to is p. Then, using ()3.27p . there exists some m such that for n> m 

pf][n + 1] = pS[n] + 9[n]gf][n] + c,j[n] 

for all (z, J) G A and t G T such that pf] > 0. 

Let g[n] := XliLi W^^[^]- Consider some (z, J) G ^ and t G T. If pf) > 0, then 
for n > m we have 



1=1 l=m+l 
m n 



E [^] + E ^ (pS? + 1] - [^] - w ) (3-32) 



/i4n 

i=l l=m+l 
m n n 

= E/^'W^S[^]+ E 7/n(pS[n+i]-pSN)- E 7^nc^^w- 

i=l l=m+l l=m+l 

Otherwise, if p^j = 0, then from equation (jH.HOj) . we have 

pf][n + 1] > pf][n] + 9[n]gS[n] + cu[n], 

so 

m n n 

aSM < E/^'M^JiW + E 7^n(pS[n+ 1] -pSM) - E 7/nCaM. (3.33) 

1=1 l=m+l l=m+l 

It is straightforward to see that the sequence of iterates is primal feasible, 

and that we obtain a primal feasible sequence by setting 

Mi 



M,; 



max^^^w- E 



3.2. DISTRIBUTED ALGORITHMS 



81 



recursively, starting from m = Mi and proceeding through to m = 1. Sherah and 
Choi jnS] showed that, if the required conditions on the step sizes {6'[?t,]} and convex 
combination weights {/^/i?^]} are satisfied, then 

m n 
1=1 l=m+l 

as — s> cx); hence we see from equations ()H.H2j) and (jH.HHj) that, for k sufficiently large. 

Mi n 
m'=m l=m+l 

Recalling the primal problem ()3.26|) . we see that complementary slackness with p 
holds in the limit of any convergent subsequence of {^[n]}. □ 

The required conditions on the step sizes and convex combination weights are 
satisfied by the following choices (jHSl Corollaries 2-4]): 

1. step sizes {6*^} such that 9[n] > 0, lim„_»o = 0, Yl'^=i = oo, and convex 
combination weights {yujn]} given by fii[n] = 6[l] / Y12=i ^l^] all / = 1, . . . , n, 
n = 0, 1, . . .; 

2. step sizes {6'[n]} given by 6[n] = a/{b + cn) for all tt, = 0, 1, . . ., where a > 0, 
6 > and c > 0, and convex combination weights {/U/i^T-]} given by = 1/n 
for all / = 1, . . . , n, n = 0, 1, . . .; and 

3. step sizes given by 9[n] = n~°' for all n = 0, 1, . . ., where < a < 1, and 
convex combination weights {/x;[ri]} given by fj,i[n] = 1/n for all / = 1, . . . ,n, 
n = 0,l,.... 

Moreover, for all three choices, we have /x/[n + l]//ii[n] independent of / for all n, so 
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primal iterates can be computed iteratively using 

n 

x[n] = y^/i; [n]x[l] 
1=1 

n-l 

= ^ni[n]x[l] + fin[n\x[n\ 
1=1 

= (f)[n — l]x[n — 1] + /^nf^^ja^M, 

where 0[n] := fii[n + 

This gives us our distributed algorithm. We summarize the subgradient method 
in Figure EiSl We see that, although the method is indeed a distributed algorithm, it 
again operates in synchronous rounds. Again, we expect that this synchronicity can 
be relaxed in practice, but this issue remains to be investigated. 

3.3 Dynamic multicast 

In many applications, membership of the multicast group changes in time, with nodes 
joining and leaving the group, rather than remaining constant for the duration of the 
connection, as we have thus far assumed. Under these dynamic conditions, we often 
cannot simply re-establish the connection with every membership change because 
doing so would cause an unacceptable disruption in the service being delivered to those 
nodes remaining in the group. A good example of an application where such issues 
arise is real-time media distribution. Thus, we desire to find minimum-cost time- 
varying subgraphs that can deliver continuous service to dynamic multicast groups. 

Although our objective is clear, our description of the problem is currently vague. 
Indeed, one of the principal hurdles to tackling the problem of dynamic multicast 
lies in formulating the problem in such a way that it is suitable for analysis and 
addresses our objective. For routed networks, the problem is generally formulated 
as the dynamic Steiner tree problem, which was first proposed in jS2]- Under this 
formulation, the focus is on worst-case behavior and modifications of the multicast 
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1. Each node i computes Sij for its outgoing hyperarcs and initializes 
Pij[0] to a point in the feasible set of ()3.24|) . For example, we take 
pfj[0] := Sij/\T\. Each node i sends Sjj andpjjfO] over each outgoing 
hyperarc (z, J). 

2. At the nth iteration, use p'^*''[n] as the hyperarc costs and run a dis- 
tributed shortest path algorithm, such as distributed Bellman-Ford, 
to determine x'-*^ [n] for all t E T. 

3. Each node i computes Pij[n + 1] for its outgoing hyperarcs using 
Proposition 13.31 Each node i sends Pij[n + 1] over each outgoing 
hyperarc (z, J). 

4. Nodes compute the primal iterate x[n] by setting 



5. The current coding subgraph z[n] is computed using the primal iterate 
x[n]. For each node i, we set 



recursively, starting from m = Mi and proceeding through to m = 1. 

6. Steps I2HS1 are repeated until the sequence of primal iterates {x[n]} 
converges. 



n 




1=1 




Figure 3.5: Summary of the subgradient method. 
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tree are allowed only when nodes join or leave the multicast group. The formulation is 
adequate, but not compelling — indeed, there is no compelling reason for the restriction 
on when the multicast tree can be modified. 

In our formulation for coded networks, we draw some inspiration from but we 
focus on expected behavior rather than worst-case behavior, and we do not restrict 
modifications of the multicast subgraph to when nodes join or leave the multicast 
tree. We formulate the problem as follows. 

We employ a basic unit of time that is related to the time that it takes for changes 
in the multicast subgraph to settle. In particular, suppose that at a given time the 
multicast subgraph is z and that it is capable of supporting a multicast connection 
to sink nodes T. Then, in one unit time, we can change the multicast subgraph to 
z', which is capable of supporting a multicast connection to sink nodes T', without 
disrupting the service being delivered to T fl T' provided that (componentwise) z > z' 
OT z < z'. The interpretation of this assumption is that we allow, in one time unit, 
only for the subgraph to increase, meaning that any sink node receiving a particular 
stream will continue to receive it (albeit with possible changes in the code, depending 
on how the coding is implemented) and therefore facing no significant disruption to 
service; or for the subgraph to decrease, meaning that any sink node receiving a 
particular stream will be forced to reduce to a subset of that stream, but one that 
is sufficient to recover the source's transmission provided that the sink node is in T', 
and therefore again facing no significant disruption to service. We do not allow for 
both operations to take place in a single unit of time (which would allow for arbitrary 
changes) because, in that case, sink nodes may face temporary disruptions to service 
when decreases to the multicast subgraph follow too closely to increases. 

As an example, consider the four-node lossless network shown in Figure IXHl Sup- 
pose that s = 1 and that, at a given time, we have T = {2, 4}. We support a multicast 
of unit rate with the subgraph 



{Zi2, Zi3, Z24, Z34) = (1, 0, 1, 0). 
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Figure 3.6: A four-node lossless network. 



Now suppose that the group membership changes, and node 2 leaves while node 3 
joins, so T' = {3,4}. As a result, we decide that we wish to change to the subgraph 

(212,213,^24,^34) = (0,1,0,1). 

If we simply make the change naively in a single time unit, then node 4 may face 
a temporary disruption to its service because packets on (2,4) may stop arriving 
before packets on (3, 4) start arriving. The assumption that we have made on allowed 
operations ensures that we must first increase the subgraph to 

(^12, ^13, 2:24, 2:34) = (1, 1,1,1), 

allow for the change to settle by waiting for one time unit, then decrease the subgraph 
to 

(212, 213, 224, 2:34) = (0,1,0,1). 

With this series of operations, node 4 maintains continuous service throughout the 
subgraph change. 

We discretize the time axis into time intervals of a single time unit. We suppose 
that, at the beginning of each time interval, we receive zero or more requests from 
sink nodes that are not currently part of the multicast group to join and zero or more 
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requests from sink nodes that are currently part of the multicast group to leave. We 
model these join and leave requests as a discrete stochastic process and make the 
assumption that, once all the members of the multicast group leave, the connection is 
over and remains in that state forever. Let denote the sink nodes in the multicast 
group at the end of time interval m. Then, we assume that 



for any initial multicast group T. A possible, simple model of join and leave requests 
is to model \Tm\ as a birth-death process with a single absorbing state at state 0, and 
to choose a node uniformly from A/"' \ T^, where A/"' := A/" \ {s}, at each birth and 
from Tm at each death. 

Let z^^^ be the multicast subgraph at the beginning of time interval m, which, by 
the assumptions made thus far, means that it supports a multicast connection to sink 
nodes T^-i- Let Vm-i and VTm-i be the join and leave requests that arrive at the end 
of time interval m — 1, respectively. Hence, Kn-i C Af' \ T^-i, Wm-i C T^-i, and 

= {Tm-i \ Wm-i) U Kn-i- We choose z^"'+'^^ from z^""^ and T^ using the function 
Urn, SO = /i^ , T^) , where must lie in a particular constraint set 



To characterize the constraint set U{z,T), recall the optimization problem for 
minimum-cost multicast in Section IHTTl 



lim Pr(T„, ^ 0|To = T) = 



(3.34) 



m- 



U{z^^\Tj. 



minimize f{z) 



subject to z ^ Z, 



< ZijbijK, 



W {i,J) eA, K cJ,te T, 



(3.35) 
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Figure 3.7: A lossless network used for dynamic multicast. 



Therefore, it follows that we can write U{z,T) = U^{z,T) U U-{z,T), where 

U+{z,T) = {z' eZ{T)\z'>z}, 
U.{z,T) = {z' eZ{T)\z' <z}, 

and Z{T) is the feasible set of z in problem ()3.35|) for a given T, i.e., if we have the 
subgraph z at the beginning of a time interval and we must go to a subgraph that sup- 
ports multicast to T, then the allowable subgraphs are those that support multicast 
to T and either increase z (those in U+{z,T)) or decrease z (those in U-{z,T)). 

Note that, if we have separable constraints, then U{z^"^\Tm) 7^ for all z^*"^ G Z 
provided that Z(Tm) 7^ 0, i.e., from any feasible subgraph at stage m, it is possible 
to go to a feasible subgraph at stage m + 1 provided that one exists for the multicast 
group Tm. But while this is the case for coded networks, it is not always the case for 
routed networks. Indeed, if multiple multicast trees are being used (as discussed in 
|lU9j . for example), then it is definitely possible to find ourselves in a state where we 
cannot achieve multicast at stage m + 1 even though static multicast to T^ is possible 
using multiple multicast trees. 

As an example of this phenomenon, consider the lossless network depicted in 
Figure IT71 Suppose that each arc is of unit capacity, that s = 1, and that, at a 
given time, we have T = {6,8}. We support a multicast of rate 2 with the two 
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trees {(1, 3), (3, 4), (4, 5), (5, 6), (5, 7), (7, 8)} and {(1, 2), (2, 6), (6, 8)}, each carrying 
unit rate. Now suppose that the group membership changes, and node 6 leaves while 
node 7 joins, so T' = {7,8}. It is clear that static multicast to T' is possible using 
multiple multicast trees (we simply reflect the solution for T), but we cannot achieve 
multicast to T' by only adding edges to the two existing trees. Our only recourse 
at this stage is to abandon the existing trees and establish new ones, which causes a 
disruption to node 8's service, or to reconfigure slowly the existing trees, which causes 
a delay before node 7 is actually joined to the group. 

Returning to the problem at hand, our objective is to find a policy vr = {/io, /^i, • • • , } 
that minimizes the cost function 



J^{z^°\To) = lim E 

M— >oo 



'M-1 



,m=0 



where X2-^'\{0} the characteristic function for 2-^' \ {0} (i.e., X2-^'\{0}(^) = 1 if 
T^dS, and x2A.'\|0}(T) = OifT = 0). 

We impose the assumption that we have separable constraints and that Z{M') ^ 0, 
i.e., we assume that there exists a subgraph that supports broadcast. This assumption 
ensures that the constraint set U (2, T) is non-empty for all z & Z and T C A/"'. Thus, 
from condition (j3.34j) . it follows that there exists at least one policy vr such that 
J^(z^^\Tq) < 00 (namely, one that uses some fixed z E Z{J\f') until the multicast 
group is empty). 

It is now not difficult to see that we are dealing with an undiscounted, infinite- 
horizon dynamic programming problem (see, e.g., ^2 Chapter 3]), and we can apply 
the theory developed for such problems to our problem. So doing, we first note that 
the optimal cost function J* := min^r Jn satisfies Bellman's equation, namely, we have 



r{z, T) = min {/(n) + E[J*(n, {T\V)U W)]} 

u&U{z,T) 

if T 7^ 0, and J*{z,T) = if T = 0. Moreover, the optimal cost is achieved by the 
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stationary policy n = {/i, /i, . . .}, where fi is given by 

n{z,T) = aTgmm{f{u) + E[r{u,{T\V)UW)]} (3.36) 

ueU{z,T) 

if T ^ 0, and fi{z, T) = if T = 0. 

The fact that the optimal cost can be achieved by a stationary policy limits the 
space in which we need to search for optimal policies significantly, but we are still 
left with the difficulty that the state space is uncountably large — it is the space of 
all possible pairs {z,T), which is Z x 1^' . The size of the state space more or less 
eliminates the possibility of using techniques such as value iteration to obtain J*. 

On the other hand, given J*, it does not seem at all implausible that we can 
compute the optimal decision at the beginning of each time interval using ()3.36p . 
The constraint set is the union of two polyhedra, which can simply be handled by 
optimizing over each separately. The objective function can pose a difficulty because, 
even if / is convex, it may not necessarily be convex owing to the term E[J*(n, (T \ 
y) U Wy\. But, since we are unable to obtain J* precisely on account of the large 
state space, we can restrict our attention to approximations that make problem (j3.36p 
tractable. 

For dynamic programming problems, there are many approximations that have 
been developed to cope with large state spaces (see, e.g., [TTl Section 2.3.3]). In par- 
ticular, we can approximate J*{z,T) by J{z,T,r), where J{z,T,r) is of some fixed 
form, and r is a parameter vector that is determined by some form of optimization, 
which can be performed offline if the graph Q is static. Depending upon the approxi- 
mation that is used, we may even be able to solve problem (j3.36|) using the distributed 
algorithms described in Section f3.2l (or simple modifications thereof). The specific 
approximations J{z, T, r) that we can use and their performance are beyond the scope 
of this thesis. 



Chapter 4 



Performance Evaluation 



IN THE preceding two chapters, we laid out a solution to the efficient operation 
problem for coded packet networks. The solution we described has several attrac- 
tive properties. In particular, it can be computed in a distributed manner and, in 
many cases, it is possible to solve the problem, as we have defined it in Section II. 2[ 
optimally for a single multicast connection. But the question remains, is it actually 
useful? Is there a compelling reason to abandon the routed approach, with which we 
have so much experience, in favor of a new one? 

We believe that for some applications the answer to both questions is yes and, in 
this chapter, we report on the results of several simulations that we conducted to as- 
sess the performance of the proposed techniques in situations of interest. Specifically, 
we consider three problems: 

1. minimum-transmission wireless unicast: the problem of establishing a unicast 
connection in a lossy wireless network using the minimum number of transmis- 
sions per message packet; 

2. minimum- weight wireline multicast: the problem of establishing a multicast 
connection in a lossless wireline network using the minimum weight, or artificial 
cost, per message packet; 
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3. minimum-energy wireless multicast: the problem of establishing a multicast 
connection in a lossless wireless network using the minimum amount of energy 
per message packet. 

We deal with these problems in Sections 14.11 14.21 and 14.31 respectively. We find that 
lossy wireless networks generally offer the most potential for the proposed techniques 
to improve on existing ones and that these improvements can indeed be significant. 

4.1 Minimum-transmission wireless unicast 

Establishing a unicast connection in a lossy wireless network is not trivial. Packets 
are frequently lost, and some mechanism to ensure reliable communication is required. 
Such a mechanism should not send packets unnecessarily, and we therefore consider 
the objective of minimizing the total number of transmissions per message packet. 

There are numerous approaches to wireless unicast; we consider five, three of 
which (approaches HHSI) are routed approaches and two of which (approaches El and 
are coded approaches: 

1. End-to-end retransmission: A path is chosen from source to sink, and pack- 
ets are acknowledged by the sink, or destination node. If the acknowledgment 
for a packet is not received by the source, the packet is retransmitted. This rep- 
resents the situation where reliability is provided by a retransmission scheme 
above the link layer, e.g., by the transmission control protocol (tcp) at the 
transport layer, and no mechanism for reliability is present at the link layer. 

2. End-to-end coding: A path is chosen from source to sink, and an end-to-end 
forward error correction (PEc) code, such as a Reed-Solomon code, an lt code 
[B^ . or a Raptor code [Hllinni, is used to correct for packets lost between source 
and sink. This is the Digital Fountain approach to reliability 

3. Link- by-link retransmission: A path is chosen from source to sink, and ARQ 
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is used at the link layer to request the retransmission of packets lost on every 
link in the path. Thus, on every link, packets are acknowledged by the intended 
receiver and, if the acknowledgment for a packet is not received by the sender, 
the packet is retransmitted. 

4. Path coding: A path is chosen from source to sink, and every node on the 
path employs coding to correct for lost packets. The most straightforward way of 
doing this is for each node to use an FEC code, decoding and re-encoding packets 
it receives. The drawback of such an approach is delay. Every node on the path 
codes and decodes packets in a block. A way of overcoming this drawback is 
to use codes that operate in more of a "convolutional" manner, sending out 
coded packets formed from packets received thus far, without decoding. The 
random linear coding scheme of Section 12.11 is such a code. A variation, with 
lower complexity, is described in |HH] . 

5. Full coding: In this case, paths are eschewed altogether, and we use our so- 
lution to the efficient operation problem. Problem ()3.2j) is solved to find a 
subgraph, and the random linear coding scheme of Section 12.11 is used. This 
represents the limit of achievability provided that we are restricted from modi- 
fying the design of the physical layer and that we do not exploit the timing of 
packets to convey information. 

4.1.1 Simulation set-up 

Nodes were placed randomly according to a uniform distribution over a square region. 
The size of the square was set to achieve unit node density. We considered a network 
where transmissions were subject to distance attenuation and Rayleigh fading, but 
not interference (owing to scheduling). So, when node i transmits, the signal-to-noise 
ratio (snr) of the signal received at node j is 7(i(i, where 7 is an exponentially- 
distributed random variable with unit mean, d{i,j) is the distance between node i 
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Figure 4.1: Average number of transmissions per packet as a function of network size 
for various wireless unicast approaches. 

and node j, and a is an attenuation parameter that we took to be 2. We assumed 
that a packet transmitted by node i is successfully received by node j if the received 
SNR exceeds (3, i.e., 'yd{i,j)~" > /3, where (3 is a. threshold that we took to be 1/4. If 
a packet is not successfully received, then it is completely lost. If acknowledgments 
are sent, acknowledgments are subject to loss in the same way that packets are and 
follow the reverse path. 

4.1.2 Simulation results 

The average number of transmissions required per packet using the various approaches 
in random networks of varying size is shown in Figure 14.11 Paths or subgraphs 
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were chosen in each random instance to minimize the total number of transmissions 
required, except in the cases of end-to-end retransmission and end-to-end coding, 
where they were chosen to minimize the number of transmissions required by the 
source node (the optimization to minimize the total number of transmissions in these 
cases cannot be done straightforwardly by a shortest path algorithm). We see that, 
while end-to-end coding and link-by-link retransmission already represent significant 
improvements on end-to-end retransmission, the coded approaches represent more 
significant improvements still. By a network size of nine nodes, full coding already 
improves on link-by-link retransmission by a factor of two. Moreover, as the network 
size grows, the performance of the various schemes diverges. 

Here, we discuss performance simply in terms of the number of transmissions 
required per packet; in some cases, e.g., congestion, the performance measure increases 
super-linearly in this quantity, and the performance improvement is even greater than 
that depicted in Figure 14.11 We see, at any rate, that our prescription for efficient 
operation promises significant improvements, particularly for large networks. 

4.2 Minimum- weight wireline multicast 

A common networking problem is that of minimizing the weight of a multicast connec- 
tion in a lossless wireline network, where the weight of the connection is determined 
by weights, or artificial costs, placed on links to direct the fiow of traffic. Since we 
consider a wireline network, the links are all point-to-point and all hyperarcs are sim- 
ple arcs. The cost function is linear and separable, namely, it is f{z) = j)e.A'^«i%' 
where atj is the weight of the link represented by arc The constraint set Z is the 

entire positive orthant, since it is generally assumed that the rate of the connection 
is much smaller than the capacity of the network. 

For routed networks, the standard approach to establishing minimum-weight mul- 
ticast connections is to find the shortest tree rooted at the source that reaches all the 
sinks, which equates to solving the Steiner tree problem on directed graphs (HH]- For 
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coded networks, we see that optimization problem is, in this hnear opti- 

mization problem and, as such, admits a polynomial-time solution. By contrast, the 
Steiner tree problem on directed graphs is well-known to be NP-complete. Although 
tractable approximation algorithms exist for the Steiner tree problem on directed 
graphs (e.g., [221 IHH CHI); solutions thus obtained are suboptimal relative to 
minimum-weight multicast without coding, which in turn is suboptimal relative to 
when coding is used, since coding subsumes forwarding and replicating. Thus, coding 
promises potentially significant weight improvements. 

4.2.1 Simulation set-up 

We conducted simulations where we took graphs representing various internet service 
provider (iSP) networks and assessed the average total weight of random multicast 
connections using, first, our network-coding based solution to the efficient opera- 
tion problem and, second, routing over the tree given by the directed Steiner tree 
(dst) approximation algorithm described in [22] • The graphs, and their associated 
link weights, were obtained from the Rocketfuel project of the University of Wash- 
ington jHOl- The approximation algorithm in [22] was chosen for comparison as it 
achieves a poly-logarithmic approximation ratio (it achieves an approximation ratio 
of 0(log^ |T|), where |T| is the number of sink nodes), which is roughly as good as 
can be expected from any practical algorithm, since it has been shown that it is 
highly unlikely that there exists a polynomial-time algorithm that can achieve an 
approximation factor smaller than logarithmic [5^ . 

4.2.2 Simulation results 

The results of the simulations are tabulated in Table 14.11 We see that, depending 
on the network and the size of the multicast group, the average weight reduction 
ranges from 10% to 33%. Though these reductions are modest, it is important to 
keep in mind that our solution easily accommodates distributed operation and, by 
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Network 


Approach 


Average multicast weight 






2 sinks 


4 sinks 


8 sinks 


16 sinks 


Telstra (au) 


DST approximation 


17.0 


28.9 


41.7 


62.8 




Network coding 


13.5 


21.5 


32.8 


48.0 


Sprint (us) 


DST approximation 


30.2 


46.5 


71.6 


127.4 




Network coding 


22.3 


35.5 


56.4 


103.6 


Ebone (eu) 


DST approximation 


28.2 


43.0 


69.7 


115.3 




Network coding 


20.7 


32.4 


50.4 


77.8 


Tiscali (eu) 


DST approximation 


32.6 


49.9 


78.4 


121.7 




Network coding 


24.5 


37.7 


57.7 


81.7 


Exodus (us) 


DST approximation 


43.8 


62.7 


91.2 


116.0 




Network coding 


33.4 


49.1 


68.0 


92.9 


Abovenet (us) 


DST approximation 


27.2 


42.8 


67.3 


75.0 




Network coding 


21.8 


33.8 


60.0 


67.3 



Table 4.1: Average weights of random multicast connections of unit rate and varying 
size for various approaches in graphs representing various ISP networks. 

contrast, computing Steiner trees is generally done at a single point with full network 
knowledge. 



4.3 Minimum-energy wireless multicast 

Another problem of interest is that of minimum-energy multicast (see, e.g., [HHIEIZ])- 
In this problem, we wish to achieve minimum-energy multicast in a lossless wireless 
network without explicit regard for throughput or bandwidth, so the constraint set Z 
is again the entire positive orthant. The cost function is linear and separable, namely, 
it is f{z) = E( i,j)£A^iJ^iJ^ where a^j represents the energy required to transmit a 
packet to nodes in J from node i. Hence problem ()3.3|) becomes a linear optimization 
problem with a polynomial number of constraints, which can therefore be solved 
in polynomial time. By contrast, the same problem using traditional routing-based 
approaches is NP-complete — in fact, the special case of broadcast in itself is NP- 
complete, a result shown in [HHIE]- The problem must therefore be addressed using 
polynomial-time heuristics such as the Multicast Incremental Power (mip) algorithm 
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Network size 


Approach 


Average multicast energy 






2 sinks 


4 sinks 


8 sinks 


16 sinks 


20 nodes 


MIP algorithm 


30.6 


33.8 


41.6 


47.4 




Network coding 


15.5 


23.3 


29.9 


38.1 


30 nodes 


MIP algorithm 


26.8 


31.9 


37.7 


43.3 




Network coding 


15.4 


21.7 


28.3 


37.8 


40 nodes 


MIP algorithm 


24.4 


29.3 


35.1 


42.3 




Network coding 


14.5 


20.6 


25.6 


30.5 


50 nodes 


MIP algorithm 


22.6 


27.3 


32.8 


37.3 




Network coding 


12.8 


17.7 


25.3 


30.3 



Table 4.2: Average energies of random multicast connections of unit rate and varying 
size for various approaches in random wireless networks of varying size. 

proposed in jl()7j . 

4.3.1 Simulation set-up 

We conducted simulations where we placed nodes randomly, according to a uniform 
distribution, in a 10 x 10 square with a radius of connectivity of 3 and assessed the 
average total energy of random multicast connections using first, our network-coding 
based solution to the efficient operation problem and, second, the routing solution 
given by the MIP algorithm. The energy required to transmit at unit rate to a distance 
d was taken to be cP. 

4.3.2 Simulation results 

The results of the simulations are tabulated in Table 14.21 We see that, depending 
on the size of the network and the size of the multicast group, the average energy 
reduction ranges from 13% to 49%. These reductions are more substantial than those 
reported in Section [4.2.21 but are still somewhat modest. Again, it is important to 
keep in mind that our solution easily accommodates distributed operation. 

In Table we tabulate the behavior of a distributed approach, specifically, an 
approach using the subgradient method (applied to problem ()3.26|) ). The algorithm 



Network size 


Number of sinks 


Average multicast energy 






Optimal 


25 iterations 


50 iterations 


75 iterations 


100 iterations 


30 nodes 


2 


16.2 


16.7 


16.3 


16.3 


16.2 




4 


21.8 


24.0 


22.7 


22.3 


22.1 




8 


27.8 


31.9 


29.9 


29.2 


28.8 


40 nodes 


2 


14.4 


15.0 


14.5 


14.5 


14.4 




4 


18.9 


21.8 


21.2 


19.6 


19.4 




8 


25.6 


31.5 


29.2 


28.0 


27.4 


50 nodes 


2 


12.4 


13.1 


12.6 


12.5 


12.5 




4 


17.4 


20.7 


18.9 


18.2 


18.0 




8 


22.4 


29.0 


26.8 


25.5 


24.8 



Table 4.3: Average energies of random multicast connections of unit rate and varying size for the subgradient method 
random wireless networks of varying size. The optimal energy was obtained using a linear program solver. 
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Figure 4.2: Average energy as a function of the number of iterations for the subgra- 
dient method on random 4-sink multicast connections of unit rate in random 30-node 
wireless networks. 
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Figure 4.3: Average energy as a function of the number of iterations for the subgra- 
dient method on random 8-sink multicast connections of unit rate in random 50-node 
wireless networks. 
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was run under step sizes given by 6[n] = n~^'^ and convex combination weights by 
fj'iln] = 1/n, if 77, < 30, and = 1/30, if > 30. We refer to this choice of 

parameters as the case of "modified primal recovery". Note that, despite our aiming 
to run sufficiently many trials to ascertain the true average with high probability, the 
simulations reported in Table do not agree exactly with those in Table because 
they were run on different sets of random instances. 

Our first choice of parameters was step sizes given by 6[n] = n~^'^ and convex 
combination weights by fii[n] = 1/n. This case, which we refer to as "original primal 
recovery", was found to suffer adversely from the effect of poor primal solutions 
obtained in early iterations. In Figures 14.21 and 14.31 we show the behavior of the 
subgradient method in the cases of a 4-sink multicast in a 30-node network and an 
8-sink multicast in a 50-node network, respectively, in detail. In these figures, we 
show both parameter choices, and we see that modified primal recovery performs 
substantially better. For reference, the optimal energy of the problem is also shown. 

We see that the subgradient method yields solutions that converge rapidly to an 
optimal one, and it appears to be a promising candidate for the basis of a protocol. 



Chapter 5 
Conclusion 



ROUTING is undoubtedly a satisfactory way to operate packet networks. It clearly 
works. What is not clear is whether it should be used for all types of networks. 
As we mentioned, coding is a definite alternative at least for application-layer over- 
lay networks and multi-hop wireless networks. To actually use coding, however, we 
must apply to coding the same considerations that we apply to routing. This thesis 
was motivated by exactly that. We took the basic premise of coding and addressed 
a fundamental problem in packet networks — efficient operation. We laid out a so- 
lution to the efficient operation problem, defined as it was to factor in packet loss, 
packet broadcast, and asynchronism in packet arrivals. That, we believe, is our main 
contribution. 

From here, there is promising work both in expanding the scope of the problem 
and in examining the problem more deeply. We discuss first the former. One way of 
expanding the scope of the problem is by including more considerations from network- 
ing. In particular, an important issue outside the present scope is fiow, or congestion, 
control. We have taken, as a starting point, messages admitted into the network at 
given rates and left aside the problem of determining which messages to admit and 
at what rates. This problem can be dealt with separately, e.g., using window fiow 
control as in TCP, but it need not be. In routed packet networks, fiow control can 
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be done jointly with optimal routing (see fH^ Section 6.5.1]), and it may likewise 
be possible to extend the subgraph selection techniques that we proposed so that 
they jointly perform subgraph selection and flow control. Indeed, an extension of 
the primal-dual method of Section Td.^. II to perform joint subgraph selection and flow 
control is given in [7^1 Section II-C] . Even if flow control is done separately, there has 
not, to our knowledge, been an earnest study of the flow control problem for coded 
packet networks. 

As for examining the efficient operation problem more deeply, there are funda- 
mental open questions relating to both network coding and subgraph selection. Let 
us first discuss network coding. As we mentioned in Section 12.41 the random linear 
coding scheme that we proposed as a solution to the network coding problem is good 
in that it maximizes throughput. But throughput may not be our principal concern. 
Other performance metrics that may be important are memory usage, computational 
load, and delay. Moreover, feedback may be present. Our true desire, then, is to op- 
timize over a five-dimensional space whose five axes are throughput, memory usage, 
computational load, delay, and feedback usage. 

Some points in this five-dimensional space are known. We know, e.g., that ran- 
dom linear coding achieves maximum throughput; we can calculate or estimate its 
memory usage, computational load, and delay; and we know that its feedback us- 
age is minimal or non-existent. For networks consisting only of point-to-point links, 
we have two other useful points. We know that, by using a retransmission scheme 
on each link (i.e., acknowledging the reception of packets on every link and retrans- 
mitting unacknowledged packets), we achieve maximum throughput and minimum 
memory usage, computational load, and delay at the cost of high feedback usage (we 
require a reliable feedback message for every received packet). We know also that, 
by using a low-complexity erasure code on each link (e.g., a Raptor code jHH 12^] or 
an LT code ^^), we trade-off, with respect to random linear coding, computational 
load for delay. The challenge is to fill out this space more. In the context of channel 
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coding, such a challenge might seem absurd — an overly ambitious proposition. But, 
as the slotted Aloha relay channel (see Section 11.2.11) illustrates, network coding is 
different from channel coding, and problems intractable for the latter may not be for 
the former. A preliminary attempt at tackling this problem is made in |85j . 

Let us discuss, now, open questions relating to subgraph selection. In this thesis, 
we gave distributed algorithms that apply only if the constraints caused by medium 
access issues can essentially be disregarded. But these issues are important and often 
must be dealt with, and it remains to develop distributed algorithms that incorpo- 
rate such issues explicitly. A good starting point would be to develop distributed 
algorithms for slotted Aloha networks of the type described in Section 11.2.11 

Much potential for investigation is also present in the cases for which our algo- 
rithms do apply. Other distributed algorithms are given in |2H1 1108| I112[ I113j . and 
no doubt more still can be developed. For example, our choice to approximate the 
maximum function with an /'"-norm in Section 13.21 is quite arbitrary, and it seems 
likely that there are other approximations that yield good, and possibly even better, 
distributed algorithms. 

No matter how good the distributed algorithm, however, there will be some over- 
head in terms of information exchange and computation. What we would like ideally 
is to perform the optimization instantly without any overhead. That goal is impossi- 
ble, but, failing that, we could content ourselves with optimization methods that have 
low overhead and fall short of the optimal cost. From such a suboptimal solution, we 
could then run a distributed algorithm to bring us to an optimal solution or, simply, 
use the suboptimal solution. A suboptimal, but simple, subgraph selection method 
for minimum-energy broadcast in coded wireless networks is given in |106j . Little else 
has been done. It might seem contradictory that we started this thesis by lamenting 
the use of ad hoc methods and heuristics, yet we now gladly contemplate their use. 
There is a difference, however, between proposing the use of ad hoc methods when 
the optimum is unknown or poorly defined and doing so when the optimum is known 
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but simply cannot be achieved practically. What we now call for is the latter. 

Another point about the algorithms we have proposed is that they optimize based 
on rates — rates of the desired connections and rates of packet injections. But we 
do not necessarily need to optimize based on rates, and there is a body of work in 
networking theory where subgraph selection is done using queue lengths rather than 
rates [7| ll02j . This work generally relates to routed networks, and the first that applies 
to coded networks is jSl]. Adding such queue-length based optimization methods to 
our space of mechanisms for subgraph selection may prove useful in our search for 
practical methods. What we would like to know, ideally, is the most practical method 
for network x, given its particular capabilities and constraints. This might or might 
not be one of the methods proposed in this thesis; determining whether it is, and 
what is if it is not, is the challenge. 

This drive toward practicality fits with the principal motivation of this thesis: 
we saw coding as a promising practical technique for packet networks, so we studied 
it. And we believe, on the basis of our results, that our initial hypothesis has been 
confirmed. Realizing coded packet networks, therefore, is a worthwhile goal, and 
we see our work as an integral step toward this goal. But that is not our only 
goal: Gallager's comment on the "art" of networking (see Chapter is, we believe, 
indicative of a general consensus that current understanding of data networks is poor, 
at least in relation to current understanding of other engineered systems, such as 
communication channels. There is no clear reason why this disparity of understanding 
must exist, and the advances of networking theory have done much to reduce its 
extent. The study of coded networks may reduce the disparity further — as we have 
seen in this thesis, we are, in the context of coded packet networks, able to find 
optimal solutions to previously-intractable problems. This goal, of increasing our 
general understanding, is one of the goals of this thesis, and we hope to spawn more 
work toward this goal. Perhaps coding may be the ingredient necessary to finally put 
our understanding of data networks on par with our understanding of communication 
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